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UNSTRUCTURED RECOMBINANT POLYMERS AND USES THEREOF 

CROSS-REFERENCE 

[0001] This application claims the benefit of U.S. Provisional Application No. 60/743,410 filed March 6", 

2006, which application is incorporated herein by reference. This application is a continuation-in-part 
application of 1 1/528,927 and 1 1/528,950, filed on September 27, 2006, which in turn claim priority to 
provisional applications serial nos. 60/721,270, 60/721,188, filed on 9/27/2005 and 60/743,622 filed on 
03/21/06, all of which are herein incorporated by reference in their entirety. 

BACKGROUND OF THE INVENTION 
[0002] It has been well documented that properties of proteins, in particular plasma clearance and 

immunogenicity, can be improved by attaching hydrophilic polymers to these proteins (Kochendoerfer, G. 
(2003) Expert Opin Biol Ther, 3: 1253-61), (Greenwald, R. B., et al. (2003) Adv Drug Deliv Rev, 55: 217- 
50), (Harris, J. M-, et al. (2003) Nat Rev Drug Discov, 2: 214-2 1). Examples of polymer-modified proteins 
that have been approved by the FDA for treatment of patients are Adagen, Oncaspar, PEG-Intron, Pegasys, 
Soma vert, and Neulasta. Many more polymer-modified proteins are in clinical trials. These polymers 
exert their effect by increasing the hydrodynamic radius (also called Stokes' radius) of the modified protein 
relative to the unmodified protein, which reduces the rate of clearance by kidney filtration (Yang, K., et al. 
(2003) Protein Eng, 16: 761-70). In addition, polymer attachment can reduce interaction of the modified 
protein with other proteins, cells, or surfaces. In particular, polymer attachment can reduce interactions 
between the modified protein and antibodies and other components of the immune system thus reducing the 
formation of a host immune response to the modified protein. Of particular interest is protein modification 
by PEGylation, i.e. by attaching linear or branched polymers of polyethylene glycol. Reduced 
immunogenicity upon PEGylation was shown for example for phenylalanine ammonia lyase (Gamez s A., et 
al. (2005) Mol Ther, 11: 986-9), antibodies (Deckert, P. M., et al. (2000) JntJ Cancer, 87: 382-90.), 
Staphylokinase (Collen, D., et aL (2000) Circulation^ 102: 1766-72), and hemoglobin (Jin, C, et al. (2004) 
Protein Pept Lett, 11:3 53-60). Typically, such polymers are conjugated with the protein of interest via a 
chemical modification step after the unmodified protein has been purified. 
[0003] Various polymers can be attached to proteins. Of particular interest are hydrophilic polymers that have 
flexible conformations and are well hydrated in aqueous solutions. A frequently used polymer is 
polyethylene glycol (PEG). These polymers tend to have large hydrodynamic radi relative to their 
molecular weight (Kubetzko, S., et al. (2005) Mol Pharmacol, 68: 1439-54). The attached polymers tend 
to have limited interactions with the protein they have been attached to and thus the polymer-modified 
protein retains its relevant functions. 
[0004] The chemical conjugation of polymers to proteins requires complex multi-step processes. Typically, the 
protein component needs to be produced and purified prior to the chemical conjugation step. The 
conjugation step can result in the formation of product mixtures that need to be separated leading to 
significant product loss. Alternatively, such mixtures can be used as the final pharmaceutical product. 
Some examples are currently marketed PEGylated Inter feron-alpha products that are used as mixtures 
(Wang, B. L., et al. (1998) JSubmicrosc Cytol Pathol, 30: 503-9; DhaUuin, C, et al. (2005) Bioconjug 
Chem, 16: 504-17). Such mixtures are difficult to manufacture and characterise and they contain isomers 
with reduced or no therapeutic activity. 
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[0005] Methods have been described that allow the site-specific addition of polymers like PEG. Examples are the 
selective PEGylation at a unique glycosylation site of the target protein or the selective PEGylation of a 
non-natural amino acid that has been engineered into the target proteins. In some cases it has been possible 
to selectively PEGylate the N-terrninus of a protein while avoiding PEGylation of lysine side chains in the 
target protein by carefully controlling the reaction conditions. Yet another approach for the site-specific 
PEGylation of target proteins is the introduction of cysteine residues that allow selective conjugation. All 
these methods have significant limitations. The selective PEGylation of the N-terminus requires careful 
process control and side reactions are difficult to eliminate. The introduction of cysteines for PEGylation 
can interfere with protein production and/or purification. The specific introduction of non-natural amino 
acids requires specific host organisms for protein production. A further limitation of PEGylation is that 
PEG is typically manufactured as a mixture of polymers with similar but not uniform length. The same 
limitations are inherent in many other chemical polymers. 

[0006] Chemical conjugation using multifunctional polymers which would allow the synthesis of products with 
multiple protein modules is even more complex then the polymer conjugation of a single protein domain. 

[0007] Recently, it has been observed that some proteins of pathogenic organisms contain repetitive peptide 

sequences that seem to lead to a relatively long serum halflife of the proteins containing these sequences 
(Alvarez, P., et al. (2004) J Biol Chem, 279: 3375-8 1 ). It has also been demonstrated that oligomeric 
sequences that are based on such pathogen-derived repetitive sequences can be fused to other proteins 
resulting in increased serum halflife. However, these pathogen-derived oligomers have a number of 
deficiencies. The pathogen-derived sequences tend to be immunogenic. It has been described that the 
sequences can be modified to reduce their immunogenicity. However, no attempts have been reported to 
remove T cell epitopes from the sequences contributing to the formation of immune reactions. 
Furthermore, the pathogen-derived sequences have not been optimized for pharmacological applications 
which require sequences with good solubility and a very low affinity for other target proteins. 

[0008] Thus there is a significant need for compositions and methods that would allow one to combine multiple 
polymer modules and multiple protein modules into defined multidomain products. 

SUMMARY OF THE INVENTION 
[0009] The present invention provides an unstructured recombinant polymer (URP) comprising at least 40 

contiguous amino acids, wherein said URP is substantially incapable of non-specific binding to a serum 
protein, and wherein (a) the sum of glycine (G), aspartate (D), alanine (A), serine (S), threonine (T), 
glutamate (E) and proline (P) residues contained in the URP, constitutes more than about 80% of the total 
amino acids of the URP; and/or (b) at least 50% of the amino acids are devoid of secondary structure as 
determined by Chou-Fasman algorithm. In a related embodiment, the present invention provides an 
unstructured recombinant polymer (URP) comprising at least 40 contiguous amino acids, wherein said 
URP has an in vitro serum degradation half-life greater than about 24 hours, and wherein (a) the sum of 
glycine (G), aspartate (D), alanine (A), serine (S), threonine (T), glutamate (E) and proline (P) residues 
contained in the URP, constitutes more than about 80% of the total amino acids of the URP; and/or (b) at 
least 50% of the amino acids are devoid of secondary structure as determined by Chou-Fasman algorithm. 
The subject URP can comprises a non-natural amino acid sequence. Where desired, the URP is selected for 
incorporation into a heterologous protein, and wherein upon incorporation the URP into a heterologous 
protein, said heterologous protein exhibits a longer serum secretion half-life and/or higher solubility as 
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compared to the corresponding protein that is deficient in said URP. The half-life can be extended by two 
folds, three folds, five folds, ten folds or more. In some aspects, incorporation of the URP into a 
heterologous protein results in at least a 2-fold, 3-fold, 4-fold, 5 -fold or more increase in apparent 
molecular weight of the protein as approximated by size exclusion chromatography. In some aspects, the 
URPs has a Tepitope score less than -3.5 (e.g., -4 or less, -5 or less). In some aspects, the URPs can 
contain predominantly hydrophilic residues. Where desired, at least 50% of the arnino acids of the URP are 
devoid of secondary structure as determined by Chou-Fasman algorithm. The glycine residues contained in 
the URP may constitute at least about 50% of the total arnino acids of the URP. In some aspect, any one 
type of the amino acids alone selected from the group consisting of glycine (G), aspartate (D), alanine (A), 
serine (S), threonine (T), glutamate (E) and proline (P) contained in the URP constitutes more than about 
20%, 30%, 40%, 50%, 60% or more of the total amino acids of the URP. In some aspects, the the URP 
comprises more than about 100, 150, 200 or more contiguous amino acids. 

(0010) The present invention also provides a protein comprising one or more of the subject URPs, wherein the 
subject URPs are heterologous with respect to the protein. The total length of URPs in aggregation can 
exceed about 40, 50, 60, 100, 150, 200, or more arnino acids. The protein can comprise one or more 
functional modules selected from the group consisting of effector module, binding module, N-terminal 
module, C-terrriinal module, and any combinations thereof. Where desired, the subject protein comprises a 
plurality of binding modules, wherein the individual binding modules exhibit binding specificities to the 
same or different targets. The binding module may comprise a disulfide-containing scaffold formed by 
intra-scaffold pairing of cysteines. The binding module may bind to a target molecule target is selected 
from the group consisting of cell surface protein, secreted protein, cytosolic protein, and nuclear protein. 
The target can be an ion channel and/or GPCR. Where desired, the effector module can be a toxin. The 
subject URP-containing protein typically an extended serum secretion half-life by at least 2, 3, 4, 5, 10 or 
more folds as compared to a corresponding protein that is deficient in said URP. 

[0011] In a seperate embodiment, the present invention provides a non-naturally occurring protein comprising at 
least 3 repeating units of amino acid sequences, each of the repeating unit comprising at least 6 amino 
acids, wherein the majority of segments comprising about 6 to about 15 contiguous amino acids of the at 
least 3 repeating units are present in one or more native human proteins. In one aspect, the majority of the 
segments, or each segment comprising about 9 to about 1 5 contiguous arnino acids within the repeating 
units are present in one or more native human proteins. The segments can comprise about 9 to about 15 
amino acids. The three repeating units may share substantial sequence homology, e.g., share sequence 
identify of greater than about 50%, 60%, 70%, 80%, 90% or 100% when aligned. Such non-natural protein 
may also comprise one or more modules selected from the group consisting of binding modules, effector 
modules, multimerization modules, C-terminal modules, and N-terminal modules. Where desired, the non- 
natural protein may comprise individual repeating unit having the subject unstructured recombinant 
polymer (URP). 

[0012] The present invention also provides recombinant polynucleotides comprising coding sequences that encode 
the subject URPs, URP-containing proteins, microproteins and toxins. Also provided in the present 
invention are vectors containing the subject polynucleotides, host cells harboring the vectors, genetic 
packages displaying the subject URPs, URP-containing proteins, toxins and any other proteinaceous 
entities disclosed herein. Further provided are selectable library of expression vectors of the present 
invention. **" 
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{0013] The present invention also provides method of producing a protein comprising an unstructured recombinant 
polymer (URP). The method involves (i) providing a host cell comprising a recombinant polynucleotide 
encoding the protein, said protein comprising one or more URP, said URP comprising at least 40 
contiguous amino acids, wherein said URP is substantially incapable of non-specific binding to a serum 
protein, and wherein (a) the sum of glycine (G), aspartate (D), alanine (A), serine (S), threonine (T), 
glutamate (E) and proline (P) residues contained in the URP, constitutes more than about 80% of the total 
amino acids of the URP; and/or (b) at least 50% of the amino acids are devoid of secondary structure as 
determined by Chou-Fasman algorithm; and (ii) culturing said host cell in a suitable culture medium under 
conditions to effect expression of said protein from said polynucleotide. Suitable host cells are eukaryotic 
(e.g., CHO cells) and prokaryotic cells. 
[0014] The present invention also provides a method of increasing serum secretion half-life of a protein, 

comprising: fusing said protein with one or more unstructured recombinant polymers (URPs), wherein the 
URP comprises at least about 40 contiguous amino acids, and wherein (a) the sum of glycine (G), aspartate 
(D), alanine (A), serine (S), threonine (T), glutamate (E) and proline (P) residues contained in the URP, 
constitutes more than about 80% of the total amino acids of the URP; and/or (b) at least 50% of the amino 
acids are devoid of secondary structure as determined by Chou-Fasman algorithm; and wherein said URP is 
substantially incapable of non-specific binding to a serum protein. 
[0015] Also provided in the present invention is a method of detecting the presence or absence of a specific 

interaction between a target and an exogenous protein that is displayed on a genetic package, wherein said 
protein comprises one or more unstructured recombinant polymer (URP), the method comprising: (a) 
providing a genetic package displaying a protein that comprises one or more unstructured recombinant 
polymers (URPs); (b) contacting the genetic package with the target under conditions suitable to produce a 
stable protein-target complex; and (c) detecting the formation of the stable protein-target complex on the 
genetic package, thereby detecting the presence of a specific interaction. The method may further 
comprises obtaining a nucleotide sequence from the genetic package that encodes the exogenous protein. 
In some aspects, the presence or absence of a specific interaction is between the URP and a target 
comprising a serum protein. In some aspects, the presence or absence of a specific interaction is between 
the URP and a target comprising a serum protease. 
[0016] Further included in the present invention is a genetic package displaying a microprotein, wherein said 
microprotein retains binding capability to its native target. In some aspects, the microprotein exhibits 
binding capability towards at least one family of ion channel selected from the group consisting of a 
sodium, a potassium, a calcium, an acetylcholine, and a chlorine channel. Where desired, the microprotein 
is an ion-channel-binding microprotein, and is modified such that (a) the microprotein binds to a different 
family of channel as compared to the corresponding unmodified microprotein; (b) the microprotein binds to 
a different subfamily of the same channel family as compared to the corresponding unmodified 
microprotein; (c) the microprotein binds to a different species of the same subfamily of channel as 
compared to the corresponding unmodified microprotein; (d) the microprotein binds to a different site on 
the same channel as compared to the corresponding unmodified microprotein; and/or (e) the microprotein 
binds to the same site of the same channel but yield a different biological efTect as compared to the 
corresponding unmodified microprotein. In some aspect, the microprotein is a toxin. The present 
invention also provides a library of genetic packages displaying the subject microproteins and/or toxins. 
Where desired, the genetic package displays a proteinaceous toxin that retains in part or in whole its 
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toxicity spectrum. The toxin cna be derived from a single toxin protein, or derived from a family of toxins. 
The present invention also provides a library of genetic packages wherein the library displays a family of 
toxins, wherein the family retains in part or in whole its native toxicity spectrum. 
[0017] The present invention further provides a protein comprising a plurality of ion-channel binding domains, 
wherein individual domains are microprotein domains that have been modified such that (a) the 
microprotein domains bind to a different family of channel as compared to the corresponding unmodified 
microprotein domains; (b) the microprotein domains bind to a different subfamily of the same channel 
family as compared to the corresponding unmodified microprotein domains; (c) the microprotein 
domains bind to a different species of the same subfamily as compared to the corresponding unmodified 
microprotein domains; (d) the microprotein domains bind to a different site on the same channel as 
compared to the corresponding unmodified microprotein domains; (e) the microprotein domains bind to the 
same site of the same channel but yield a different biological effect as compared to the corresponding 
unmodified microprotein domains; and/or (f) the microprotein domains bind to the same site of the same 
channel and yield the same biological effect as compared to the corresponding unmodified microprotein 
domains. 

[0018] Also embodied in the invention is a method of obtaining a microprotein with desired property, comprising: 
(a) providing a subject library; and (b) screening the selectable library to obtain at least one phage 
displaying a microprotein with the desired property. Polynucleotides, vectors, genetic packages, host cells 
for use in any one of the disclosed methods are also provided. 

INCORPORATION BY REFERENCE 
[00191 All publications and patent applications mentioned in this specification are herein incorporated by refereuce 
to the same extent as if each individual publication or patent application was specifically and individually 
indicated to be incorporated by reference. 

BRIEF DESCRIPTION OF THE DRAWINGS 
[0020] The novel features of the invention are set forth with particularity in the appended claims. A better 

understanding of the features and advantages of the present invention will be obtained by reference to the 

following detailed description that sets forth illustrative embodiments, in which the principles of the 

invention are utilized, and the accompanying drawings of which: 
[0021] FIG. 1 shows the modular components of an MURP. Binding modules, effector modules, and 

multimerization modules are depicted as circles. URP modules, N-terminal, and C-lerminal modules are 

shown as rectangles. 

[0022] FTG. 2 shows examples of modular achitectures ofMURPs. Binding modules (BM) in one MURP can 

have identical or differing target specificities. 
[0023] FIG. 3 shows that a repeat protein that is based on a human sequence can contain novel amino acid 

sequences, which can contain T cell epitopes. These novel sequences are formed at the junction between 

neighboring repeat units. 

[0024] FIG. 4 illustrates the design of a URP sequence that is a repeat protein based on three human donor 

sequences D 1 , D2, and D3. The repeating unit of this URP was chosen such that even 9-mer sequences 
that span the junction between neighboring units can be found in at least one of the human donor 
sequences. 
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[0025] FIG. 5 Example of a URP sequences that is a repeat protein based on the sequences of three human 

proteins. The lower portion of the figure illustrates that ali 9-mer subsequences in the URP occur in at least 
one of the human donor proteins. 

[0026] FIG. 6 Example based URP sequence based on the human POU domain residues 146-1 82. 

[0027] FIG. 7 shows the advantage of separating modules with mformation rich sequences by inserting URP 

modules between such sequences. The left side of the figure shows that the direct fusion of modules A and 
B leads to novel sequences in the junction region. These junction sequences can be epitopes. The right 
half of the figure shows that the insertion of a URP module between module A and B prevents the 
formation of such junction sequences that contain partial sequences from modules A and B. Instead, the 
termini of modules A and B yield junction sequences that contain URP sequences and thus are predicted to 
have low immunogenicity. 

[002S] FIG. 8 shows drug delivery constructs that are based on URPs. The drug molecules depicted as hexagons 

are chemically conjugated to the MURP. 
[0029] FIG. 9 shows and MURP containing a protease-sensitive site. The URP module is designed such that it 

blocks the effector module from its function. Protease cleavage removes a portion of the URP module and 

results in increased activity of the effector function. 
[0030] FIG. 10 shows how an URP module can act as a linker between a binding module and an effector module. 

The binding module can bind to a target and as a consequence it increases the local concentration of the 

effector module in the proximity of the target. 
[0031] FIG. 1 1 Shows a process to construct genes encoding URP sequences from libraries of short URP modules. 

The URP module library can be inserted into a stuffer vector that contains green fluorescent protein (GFP) 

as a reporter to facilitate the identification of URP sequences with high expression. The figure illustrates 

that genes encoding long URP sequences can be build by iterative dimerization. 
[0032] FIG. 12 shows MURPs that contain multiple binding modules for death receptors. Death receptors are 

triggered by trimerization and thus MURPs containing at least three binding elements for one death 

receptor particularly potent in inducing cell death. The lower portion of the figure illustrates that one can 

increase the specificity of the MURP for diseased tissue by adding one or more binding modules with 

specificity for rumor tissue. 
[0033] FIG. 13 shows a MURP that comprises four binding modules (rectangles) with specificity for a tumor 

antigen with an effector module like interleukin 2. 
[0034] FIG. 14 shows the flow chart for the construction of URP modules with 288 residues. The URP modules 

were constructed as fusion proteins with GFP. Libraries of URP modules with 36 amino acids were 

constructed first followed by iterative dimerization to yield URP modules with 288 amino acids 

(rPEG_H288 and rPEG_J288). 
[0035] FIG. 15 Amino acid and nucleotide sequence of a URP module with 288 amino acids (rPEG J288). 
[0036] FIG. 16 Amino acid and nucleotide sequence of a URP module with 288 amino acids (rPEG_H288). 
[0037] FIG. 17 Amino acid sequence of a serine-rich sequence region of the human protein dentin 

sialophosphoprotein. 

[0038] FIG. 1 8 shows a depot derivative of a MURP. The protein contains two cysteine residues that can form a 
weak SS bridge. The protein can be manufactured with the SS bridge intact. It can be formulated and 
injected into patients in reduced form. After injection it will be oxidized in proximity to the injection site 
and as a result in can form a high molecular weight polymer with very limited diffusivity. The active 
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MURP can slowly leach from the injection site by limited proteolysis or limited reduction of the cross 
linking SS bond. 

[0039] FIG. 1 9 shows a depot form of a MURP. The MURP has very limited diffusivity at the injection site and 
can be liberated from the injection site by limited proteolysis. 

[0040] FIG. 20 shows a depot form of a MURP that contains a histidine-rich sequence. The MURP can be 

formulated and injected in combination with insoluble beads that contain immobilized nickel. The MURP 
binds to the nickel beads at the injection site and is released slowly into the circulalion. 

[0041] FIG. 21 shows MURPs that contain multimerization modules. The upper part of the figure shows an 

MURP that contains one dimerization sequence. As a result it forms a dimer which effectively doubles its 
molecular weight. The center of the figure shows three MURP designs that comprise two multimerization 
sequences. Such MURPs can form multimers with very high effective molecular weight. The lower part of 
the figure illustrated an MURP that contains multiple RGD sequences that are known to bind to cell surface 
receptors and thus confer half-life. 

[0042] FIG. 22 Shows a variety of MURPs that are designed to block or modulate ion channel function. Circles 
indicate binding modules with specificity for ion channels. These binding modules can be derived or 
identical to natural toxins with affinity for ion channel receptors. The figure illustrates that other binding 
domains can be added on either side of the ion channel-specific binding modules thus conferring the 
MURPs increased efficacy or specificity for a particular cell type. 

[0043] FIG. 23 shows several MURP designs for increased half-life. Increased effective molecular weight can be 
achieved by increasing chain length (A), chemical multimerization (B), adding multiple copies of binding 
modules into a molecule separated by non-binding sites (C), construction of chemical multimers similar to 
C (D, E), including multimerization sequences (F). 

[0044] FIG. 24 shows MURPs that can be formed by chemical conjugation of binding modules to a recombinant 
URP sequence. The URP sequence is designed to contain multiple lysine residues (K) as conjugation sites. 

[0045] FIG. 25 shows the design of a library of 2SS binding modules. The sequences contain a constant 1SS 
sequence in the center which is flanked by random sequences that contain cysteine residues in varying 
distance from the 1 SS core. 

[0046] FIG. 26 shows the design of a library of 2SS binding modules. The sequences contain a constant 1SS 
sequence in the center which is flanked by random sequences that contain cysteine residues in varying 
distance from the 1 SS core. 

[0047] FIG. 27 shows the design of a library of dimers of 1SS binding modules. Initially, a collection of 1SS 

binding modules is amplified by two PCR reactions. The resulting PCR products are combined and dimers 

are generated in a subsequent PCR step. 
[0048] FIG. 28 show the Western analysis of a fusion protein containing the 288 amino acid URP sequence 

rPEG_J288 after incubation of up to 3 days in 50% mouse serum. 
[0049] FIG. 29 shows results of a binding assay testing for pre-existing antibodies against a URP sequence of 288 

amino acids. 

[0050] FIG. 30 shows the binding of MURPs containing one (Monomer), two (Dimer), four (Tetramer), or zero 
(rPEG36) binding modules with specificity for VEGF which was coated to microtiter plates. 

[0051] FIG. 31 show sthe amino acid sequence of an MURP with specificity for EpCAM. The sequence contains 
four binding modules with affinity for EpCAM (underlined). The sequence contains an N-terrninal Flag 
sequence which contains the only two lysine residues of the entire sequence. 
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[0052] FIG. 32 shows the design of 1SS addition libraries. Random 1SS modules can be added to the N- or C- 

terminus of a pre-selected binding module or simultaneously to both sides. 
[0053] FIG. 33 shows the alignment of three finger toxin-related sequences. The figure also shows a 3D structure 

that was solved by NMR. 

[0054] FIG. 34 shows the design of a three-finger toxin-based library. Residues designated X were randomized. 

The codon choice for each random position is indicated. 
[0055J FIG. 35 shows the alignment of plexin-related sequences. 

[0056] FIG. 36 shows the design of a plexin-based library. Residues designated X were randomized. The codon 

. choice for each random position is indicated. 
[0057] FIG. 37 Sequences of plexin-related binding modules with sepecificity for DR4, ErbB2, and HGFR. 
[0053] FIG. 38 shows a binding assay for microprotein-based binding domains with specificity for VEGF. 
[0059] FIG. 39 shows sequences of 2SS and 3SS binding modules that were isolated from buildup libraries with 
specificity for VEGF. The upper part of the protein shows PAGE gel analysis of the proteins purified by 
heat-lysis. 

[0060] FIG. 40 shows cloning steps to construct the URP sequence rPEG_J72. 

[0061] FIG. 41 shows the construction of a library of URP modules with 36 amino acids called rPEG_J36. The 
region encoding rPEG_J36 was assembled by ligating three shorter segments encoding rPEG_J12 and a 
stopper module. 

[0062] FIG. 42 shows the nucleotide sequence and translation of the sniffer vector pCW005 1. The stuffer region 

is flanked by Bsal and Bbsl sites and contains multiple stop codons. 
[0063] FIG. 43 shows a PAGE gel of the purification of the URP rPEG_J288 fused to GFP. Lane 2 shows the cell 

lysate; lane 3: product purified by IMAC; lane 4: product purified by anti-Flag. 
[0064] FIG. 44 Amino acid sequence of fusion proteins between rPEG_J2SS and human effector domains 

interferon alpha, G-CSF, and human growth hormone. 
[0065] FIG. 45 shows the Western analysis of expression of fusion proteins between rPEG_J288 and human 

growth hormone (lanes 1 and 2), interferon alpha (lanes 3 and 4), and GFP (lanes 5 and 6). Both soluble 

and insoluble material was analyzed for each protein. 
[0066] FIG. 46 shows the design of MURPs based on the toxin OSK1. The figure shows that URP sequences 

and/or binding modules can be added to either side of OSK1 
[0067] FIG 47 depicts exemplary product formats comprising the subjet URPs. 

DETAILED DESCRIPTION OF THE INVENTION 
[0068] While preferred embodiments of the present invention have been shown and described herein, it will be 

obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous 
variations, changes, and substitutions will now occur to those skilled in the art without departing from the 
invention. It should be understood that various alternatives to the embodiments of the invention described 
herein may be employed in practicing the invention. It is intended that the following claims define the 
scope of the invention and that methods and structures within the scope of these claims and their 
equivalents be covered thereby. 
General Techniques: 

[0069] The practice of the present invention employs, unless otherwise indicated, conventional techniques of 
immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics and 
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recombinant DNA, which are within the skill of the art. See Sambrook, Fritsch and Maniatis, 
MOLECULAR CLONING: A LABORATORY MANUAL, 2 nd edition (1989); CURRENT PROTOCOLS 
IN MOLECULAR BIOLOGY (F. M. Ausubel, et al eds., (1987)); the series METHODS IN 
ENZYMOLOGY (Academic Press, Inc.): PGR 2: A PRACTICAL APPROACH (MX MacPherson, B.D. 
Hames and G.R. Taylor eds. (1995)), Harlow and Lane, eds. (1988) ANTIBODIES, A LABORATORY 
MANUAL, and ANIMAL CELL CULTURE (R.L Freshney, ed. (1987)). 

Definitions: 

[0070] As used in the specification and claims, the singular form "a", "an" and "the" include plural references 
unless the context clearly dictates otherwise. For example, the term "a cell" includes a plurality of cells, 
including mixtures thereof. 

[00711 The terms "polypeptide", "peptide", "amino acid sequence" and protein" are used interchangeably herein 
to refer to polymers of amino acids of any length. The polymer may be linear or branched, it may comprise 
modified amino acids, and it may be interrupted by non-arnino acids. The terms also encompass an amino 
acid polymer that has been modified, for example, disulfide bond formation, glycosylation, lipidation, 
acetylation, phosphorylation, or any other manipulation, such as conjugation with a labeling component. 
As used herein the term "amino acid" refers to either natural and/or unnatural or synthetic amino acids, 
including but not limited to glycine and both the D or L optical isomers, and amino acid analogs and 
peptidomimetics. Standard single ot three letter codes are used to designate amino acids. 

[0072] A "repetitive sequence" refers to an amino acid sequence that can be described as an oligomer of repeating 
peptide sequences, forming direct repeats, or inverted repeats or alternating repeats of multiple sequence 
motifs. These repeating oligomer sequences can be identical or homologous to each other, but there can 
also be multiple repeated motifs. Repetitive sequences are characterized by a very low information content. 
A repetitive sequence is not a required feature of a URP and in some cases a non-repetitive sequence will in 
fact be preferred. 

[0073] Amino acids can be characterized based on their hydrophobicity. A number of scales have been developed. 

An example is a scale developed by Levitt, M et al. (see Levitt, M (1976) J Mol Biol 104, 59, #3233, which 
is listed in Hopp, TP, et al. (1981) Proc Natl Acad Sci USA 78, 3824, #3232). Examples of "hydrophilic 
amino acids" are arginine, lysine, threonine, alanine, asparagine, and glutamine. Of particular interest are 
the hydrophilic amino acids aspartate, glutamate, and serine, and glycine. Examples of "hydrophobic amino 
acids" are tryptophan, tyrosine, phenylalanine, methionine, leucine, isoleucine, and valine. 

[0074] The term "denatured conformation" describes the state of a peptide in solution that is characterized by a 
large conformational freedom of the peptide backbone. Most peptides and proteins adopt a denatured 
conformation in the presence of high concentrations of denaturants or at elevated temperatures. Peptides in 
denatured conformation have characteristic CD spectra and they are generally characterized by a lack of 
long range interactions as determined by e.g., NMR. Denatured conformation and unfolded conformation 
will be used synonymously. 

[0075] The terms "unstructured protein (UNP) sequences" and "unstructured recombinant polymer" (URP) are 
used herein interchanageably. The terms refer to amino acid sequences that share commonality with 
denatured peptide sequences, e.g., exhibiting a typical behavior like denatured peptide sequences, under 
physioloigical conditions, as detailed herien. URP sequences lack a defined tertiary structure and they have - 
limited or no secondary structure as detected by, e.g., Chou-Fasman algorithm. 
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[0076] As used herein, the term "cell surface proteins" refers to the plasma membrane components of a cell. It 
encompasses integral and peripheral membrane proteins, glycoproteins, polysaccharides and lipids that 
constitute the plasma membrane. An integral membrane protein is a transmembrane protein that extends 
across the lipid bilayer of the plasma membrane of a cell. A typical integral membrane protein consists of at 
least one membrane spanning segment that generally comprises hydrophobic amino acid residues. 
Peripheral membrane proteins do not extend into the hydrophobic interior of die lipid bilayer and they are 
bound to the membrane surface via covalent or noncovalent interaction direcdy or indirectly with other 
membrane components. 

[0077] The terms "membrane", "cytosolic", "nuclear" and "secreted" as applied to cellular proteins specify the 
extracellular and/or subcellular location in which the cellular protein is mostly, predominantly, or 
preferentially localized. 

[0078] "Cell surface receptors" represent a subset of membrane proteins, capable of binding to their respective 

ligands. Cell surface receptors are molecules anchored on or inserted into the cell plasma membrane. They 
constitute a large family of proteins, glycoproteins, polysaccharides and lipids, which serve not only as 
structural constituents of the plasma membrane, but also as regulatory elements governing a variety of 
biological functions. 

[0079] The term "module" refers to a portion of a protein that is physically or functionally distinguished from 
other portions of the protein or peptide. A module can comprise one or more domains. In general, a 
module or domain can be a single, stable three-dimensional structure, regardless of size. The tertiary 
structure of a typical domain is stable in solution and remains the same whether such a member is isolated 
or covalently fused to other domains. A domain generally has a particular tertiary structure formed by the 
spatial relationships of secondary structure elements, such as beta-sheets, alpha helices, and unstructured 
loops. In domains of the microprotein family, disulfide bridges are generally the primary elements that 
determine tertiary structure. In some instances, domains are modules that can confer a specific functional 
activity, such as avidity (multiple binding sites to the same target), multi-specificity (binding sites for 
different targets), halflife (using a domain, cyclic peptide or linear peptide) which binds to a serum protein 
like human serum albumin (HSA) or to IgG (hIgGl,2,3 or 4) or to red blood cells. Functionally-defined 
domains have a distinct biological function(s). The ligand-binding domain of a receptor, for example, is 
that domain that binds ligand. An antigen-binding domain refers to the part of an antigen-binding unit or an 
antibody that binds to the antigen. Functionally-defined domains need not be encoded by contiguous amino 
acid sequences. Functionally-defined domains may contain one or more physically-defined domain. 
Receptors, for example, are generally divided into the extracellular ligand-binding domain, a 
transmembrane domain, and an intracellular effector domain. A "membrane anchorage domain" refers to 
the portion of a protein that mediates membrane association. Generally, the membrane anchorage domain is 
composed of hydrophobic amino acid residues. Alternatively, the membrane anchorage domain may 
contain modified amino acids, e.g. amino acids that are attached to a fatty acid chain, which in turn anchors 
the protein to a membrane. 

[0080J "Non-naturally occurring" as applied to a protein means that the protein contains at least one amino acid 
that is different from the corresponding wildtype or native protein. Non-natural sequences can be 
determined by performing BLAST search using, e.g., the lowest smallest sum probability where the 
comparison window is the length of the sequence of interest (the queried) and when compared to the non- 
redundant ("nr") database of Genbank using BLAST 2.0. The BLAST 2.0 algorithm, which is described in 
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Altschul et al. (1990) J. Mol. Biol. 215:403-410, respectively. Software for performing BLAST analyses is 
publicly available through the National Center for Biotechnology Information. 
[0081] A "host cell" includes an individual cell or cell culture which can be or has been a recipient for the subject 
vectors. Host cells include progeny of a single host cell. The progeny may not necessarily be completely 
identical (in morphology or in genomic of total DNA complement) to the original parent cell due to natural, 
accidental, or deliberate mutation. A host cell includes cells transfected in vivo with a vector of this 
invention. 

[0082] As used herein, the term "isolated" means separated from constituents, cellular and otherwise, in which the 
polynucleotide, peptide, polypeptide, protein, antibody, or fragments thereof, are normally associated with 
in nature. As is apparent to those of skill in the art, a non-naturally occurring the polynucleotide, peptide, 
polypeptide, protein, antibody, or fragments thereof, does not require "isolation" to distinguish it from its 
naturally occurring counterpart. In addition, a "concentrated", "separated" or "diluted" polynucleotide, 
peptide, polypeptide, protein, antibody, or fragments thereof, is distinguishable from its naturally occurring 
counterpart in that the concentration or number of molecules per volume is greater than "concentrated" or 
less than "separated" than that of its naturally occiirring counterpart. 

[0083] "Linked" and "fused" or "fusion" are used interchangeably herein. These terms refer to the joining together 
of two more chemical elements or components, by whatever means including chemical conjugation or 
recombinant means. An "in-frame fusion" refers to the joining of two or more open reading frames (OFRs) 
to form a continuous longer OFR, in a manner that maintains the correct reading frame of the original 
OFRs. Thus, the resulting recombinant fusion protein is a single protein containing two ore more segments 
that correspond to polypeptides encoded by the original OFRs (which segments are not normally so joined 
in nature.) 

[00841 In the context of polypeptides, a "linear sequence" or a "sequence" is an order of amino acids in a 

polypeptide in an amino to carboxyl terminus direction in which residues that neighbor each other in the 
sequence are contiguous in the primary structure of the polypeptide. A "partial sequence" is a linear 
sequence of part of a polypeptide which is known to comprise additional residues in one or both directions. 

[0085] "Heterologous" means derived from a genotypically distinct entity from the rest of the entity to which it is 
being compared. For example, a glycine rich sequence removed from its native coding sequence and 
operatively linked to a coding sequence other than the native sequence is a heterologous glycine rich 
sequence. The term "heterologous" as applied to a polynucleotide, a polypeptide, means that the 
polynucleotide or polypeptide is derived from a genotypically distinct entity from that of the rest of the 
entity to which it is being compared. 

[0086] The terms "polynucleotides", "nucleic acids", "nucleotides" and "oligonucleotides" are used 

interchangeably. They refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides 
or ribonucleotides, or analogs thereof. Polynucleotides may have any three-dimensional structure, and may 
perform any function, known or unknown. The following are non-limiting examples of polynucleotides: 
coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, 
introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, ribozymes, cDNA, recombinant 
polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated 
RNA of any sequence, nucleic acid probes, and primers. A polynucleotide may comprise modified 
nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the 
nucleotide structure may be imparted before or after assembly of the polymer. The sequence of nucleotides 
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may be interrupted by non-nucleotide components. A polynucleotide may be further modified after 
polymerization, such as by conjugation with a labeling component. 

J0087J "Recombinant" as applied to a polynucleotide means that the polynucleotide is the product of various 

combinations of cloning, restriction and/or ligation steps, and other procedures that result in a construct that 
is distinct from a polynucleotide found in nature. 

(0088] The terms "gene" or "gene fragment" are used interchangeably herein. They refer to a polynucleotide 
containing at least one open reading frame that is capable of encoding a particular protein after being 
transcribed and translated. A gene or gene fragment may be genomic or cDNA, as long as the 
polynucleotide contains at least one open reading frame, which may cover the entire coding region or a 
segment thereof. A "fusion gene" is a gene composed of at least two heterologous polynucleotides that are 
linked together. 

[0089) A "vector" is a nucleic acid molecule, preferably self-replicating, which transfers an inserted nucleic acid 
molecule into and/or between host cells. The term includes vectors that function primarily for insertion of 
DNA or RNA into a cell, replication of vectors that function primarily for the replication of DNA or RNA, 
and expression vectors that function for transcription and/or translation of the DNA or RNA. Also included 
are vectors that provide more than one of the above functions. An "expression vector" is a polynucleotide 
which, when introduced into an appropriate host cell, can be transcribed and translated into a 
polypeptide(s). An "expression system" usually connotes a suitable host cell comprised of an expression 
vector that can function to yield a desired expression product. 

[00901 The target" as used in the context of MURPs is a biochemical molecule or structure to which the Binding 
Module or the URP-linked Binding Module can bind and where the binding event results in a desired 
biological activity. The target can be a protein ligand or receptor that is inhibited, activated or otherwise 
acted upon by the t protein. Examples of targets are hormones, cytokines, antibodies or antibody 
fragments, cell surface receptors, kinases, growth factors and other biochemical structures with biological 
activity. 

[0091] A "functional module" can be any non-URP in a protein product. Thus a functional module can be a 

binding module (BM), an effector module (EM), a mul timer izati on module (MM), a C-terminal module 
(CM), or an N-terminal module (NM). In general, functional modules are characterized by a high 
inforrriation content of their amino acid sequence, Le they contain many different amino acids and many of 
these amino acids are important for the function of a functional module. A functional module typically has 
secondary and tertiary structure, may be a folded protein domain and may contain 1,2,3,4,5 or more 
disulfide bonds. 

[0092] The term ^^0^016^8* refers to a classification in the SCOP database. Microproteins are usually the 
smallest proteins with a fixed structure and typically but not exclusively have as few as 15 amino acids 
with two disulfides or up to 200 amino acids with more than ten disulfides. A microprotein may contain 
one or more microprotein domains. Some microprotein domains or domain families can have multiple 
more-or-less stable and multiple more or less similar structures which are conferred by different disulfide 
bonding patterns, so the term stable is used in a relative way to differentiate microproteins from peptides 
and non-microprotein domains. Most microprotein toxins are composed of a single domain, but the cell- 
surface receptor microproteins often have multiple domains. Microproteins can be so small because their 
folding is stabilized either by disulfide bonds and/or by ions such as Calcium, Magnesium, Manganese, 
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Copper, Zinc, Iron or a variety of other multivalent ions, instead of being stabilized by the typical 
hydrophobic core. 

[0093] The term "scaffold" refers to the minirnal polypeptide 'framework 1 or 'sequence motif that is used as the 
conserved, common sequence in the construction of protein libraries. In between the fixed or conserved 
residues/positions of the scaffold lie variable and hypervariable positions. A large diversity of amino acids 
is provided in the variable regions between the fixed scaffold residues to provide specific binding to a 
target molecule. A scaffold is typically defined by the conserved residues that are observed in an alignment 
of a family of sequence-related proteins. Fixed residues may be required for folding or structure, especially 
if the functions of the aligned proteins are different. A full description of a microprotein scaffold may 
include the number, position or spacing and bonding pattern of the cysteines, as well as position and 
identity of any fixed residues in the loops, including binding sites for ions such as Calcium. 

(0094] The "fold" of a microprotein is largely defined by the linkage pattern of the disulfide bonds (i.e., 1-4, 2-6, 
3-5). This pattern is a topological constant and is generally not amenable to conversion into another pattern 
without unlinking and relinking the disulfides such as by reduction and oxidation (redox agents). In 
general, natural proteins with related sequences adopt the same disulfide bonding patterns. The major 
deternunants are the cysteine distance pattern (CDP) and some fixed non-cys residues, as well as a metal- 
binding site, if present. In few cases the folding of proteins is also influenced by the surrounding sequences 
(ie pro-peptides) and in some cases by chemical derivatization (ie gamma-carboxylation) of residues that 
allow the protein to bind divalent metal ions (ie Ca-H-) which assists their folding. For the vast majority of 
microproteins such folding help is not required. 

[0095] However, proteins with the same bonding pattern may still comprise multiple folds, based on differences in 
the length and composition of the loops that are large enough to give the protein a rather different structure. 
An example are the conotoxin, cyclotoxin and anato domain families, which have the same DBP but a very 
different CDP and are considered to be different folds. Determinants of a protein fold are any attributes that 
greatly alter structure relative to a different fold, such as the number and bonding pattern of the cysteines, 
the spacing of the cysteines, differences in the sequence motifs of the inter-cysteine loops (especially fixed 
loop residues which are likely to be needed for folding, or in the location or composition of the calcium (or 
other metal or co-factor) binding site. 

[0096] The term "disulfide bonding pattern" or "DBP" refers to the linking pattern of the cysteines, which are 
numbered 1-n from the N-terminus to the C-terrninus of the protein. Disulfide bonding patterns are 
topologically constant, meaning they can only be changed by unlinking one or more disulfides such as 
using redox conditions. The possible 2-, 3-, and 4-disulfide bonding patterns are listed below in paragraphs 
0048-0075. 

[0097] The term "cysteine distance pattern" or "CDP" refers to the number of non-cysteine amino acids that 
separate the cysteines on a linear protein chain. Several notations are used: C5C0C3C equals C5CC3C 
equals CxxxxxCCxxxC. 

[0098J The term Tosition n& or 'n7=4' refers to the intercysteine loops and 'n6' is defined as the loop between C6 
and C7; 'n7=4* means the loop betwene C7 and C8 is 4 amino acids long, not counting the cysteines 

[0099] Serum degradation resistance - Proteins can be eliminated by degradation in the blood, which typically 

involves proteases in the serum or plasma. The serum degradation resistance is measured by combining the 
protein with human (or mouse, rat, monkey, as appropriate) serum or plasma, typically for a range of days 
(ie 0.25, 0.5, 1, % 4, 8 S 16 days) at 37C. The samples for these timepoints are then run on a western assay 
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and the protein is detected with an antibody. The antibody can be to a tag in the protein. If the protein 
shows a single band on the western, where the protein's size is identical to that of the injected protein, then 
no degradation has occurred. The tirnepoint where 50% of the protein is degraded, as judged by western, is 
the serum degradation halflife of the protein. 
[00100] Serum protein binding - While the MURP typically has a number of modules that bind to cell-surface 

targets and/or serum proteins, it is desirable that the URP substantially lack unintended activities. The URP 
should be designed to mimrnize avoid interaction with (binding to) serum proteins, including antibodies. 
Different URP designs can be screened for serum protein binding by ELISA, immobilizing the serum 
proteins and then adding the URP, incubating, washing and then detecting the amount of bound URP. One 
approach is to detect the URP using an antibody that recognizes a tag that has been added to the URP. A 
different approach is to immobilize the URP (such as via a fusion to GFP) and come in with human serum, 
incubating, washing, and then detecting the amount of human antibodies that remain bound to the URP 
using secondary antibodies like goat anti-human IgG. Using these approaches we have designed our URPs 
to show very low levels of binding to serum proteins. However, in some applications binding to serum 
proteins or serum-exposed proteins is desired, for example because it can further extend the secretion 
halfllife. In such cases one can use these same assays to design URPs that bind to serum proteins or serum- 
exposed proteins such as HSA or IgG. In other cases the MURP can be given binding modules that contain 
peptides that have been designed to bind to serum proteins or serum-exposed proteins such as HAS or IgG, 

Unstructured Recombinant Polymers (URPs): 

[00101] One aspect of the present invention is the design of unstructured recombinant polymers (URPs). The 

subject URPs are particularly useful for generating recombinant proteins of therapeutic and/or diagnostic 
value. The subject URPs exhibit one or more following features. 

[00102J The subject URPs comprise amino acid sequences that typically share commonality with denatured peptide 
sequences under physiological conditions. URP sequences typically behave like denatured peptide 
sequences under physiological conditions. URP sequences lack well defined secondary and tertiary 
structures under physiological conditions. A variety of methods have been established in the art to 
ascertain the second and ternary structures of a given polypeptide. For example, the secondary structure of 
a polypeptide can be determined by CD spectroscopy in the "far-UV" spectral region (190-250 nm). 
Alpha-helix, beta-sheet, and random coil structures each give rise to a characteristic shape and magnitude 
of CD spectra. Secondary structure can also be ascertained via certain computer programs or algorithms 
such as the Chou-Fasman algorithm (Chou, P. Y., et al. (1974) Biochemistry, 13: 222-45). For a given 
URP sequence, the algorithm can predict whether there exists some or no secondary structure at all. In 
general, URP sequences will have spectra that resemble denatured sequences due to their low degree of 
secondary and tertiary structure. Where desired, URP sequences can be designed to have predominantly 
denatured corforrnations under physiological conditions. URP sequences typically have a high degree of 
conformational flexibility under physiological conditions and they tend to have large hydrodynamic radii 
(Stokes* radius) compared to globularproteins of similar molecular weight. As used herein, physiological 
conditions refer to a set of conditions including temperature, salt concentration, pH that mimic those 
conditions of a living subject. A host of physioloigcally relevant conditions for use in in vitro assays have 
been established. Generally, a physiological buffer contains a physiological concentration of salt and at 
adjusted to a neutral pH ranging from about 6.5 to about 7.8, and preferably from about 7.0 to about 7.5. A 
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variety of physiological buffers is listed in Sambrook et aL (1989) supra and hence is not detailed herein. 
Physiologically relevant temperature ranges from about 25 °C to about 38 °C, and preferably from about 30 
°C to about 37 °C. 

[00103] The subject URPs can be sequences with low immunogenicity. Low immunogenicity can be a direct result 
of the conformational flexibility of URP sequences. Many antibodies recognize so-called conformational 
epitopes in protein antigens. Conformational epitopes are formed by regions of the protein surface that are 
composed of multiple discontinuous amino acid sequences of the protein antigen. The precise folding of 
the protein brings these sequences into a well-defined special configuration that can be recognized by 
antibodies. Preferred URPs are designed to avoid formation of conformational epitopes. For example, of 
particular interest are URP sequences having a low tendency to adapt compactly folded conformations in 
aqueous solution. In particular, low irnmunogenicity can be achieved by choosing sequences that resist 
antigen processing in antigen presenting cells, choosing sequences that do not bind MHC welt and/or by 
choosing sequences that are derived from human sequences. 
[00104] The subject URPs can be sequences with a high degree of protease resistance. Protease resistance can also 
be a result of the conformational flexibility of URP sequences. Protease resistance can be designed by 
avoiding known protease recognition sites. Alternatively, protease resistant sequences can be selected by 
phage display or related techniques from random or semi-random sequence libraries. Where desired for 
special applications, such as slow release from a depot protein, serum protease cleavage sites can be built 
into an URP. Of particular interest are URP sequences with high stability (e.g., long serum half-life, less 
prone to cleavage by proteases present in bodily fluid) in blood. 
[00105] The subject URP can also be characterized by the effect in that wherein upon incorporation of it into a 
protein, the protein exhibits a longer serum half-life and/or higher solubility as compared to the 
corresponding protein that is deficient in the URP. [Methods of ascertaining serum half-life are known in 
the art (see e.g., Alvarez, P., et at (2004) J Biol Chem, 279: 3375-8 1). One can readily determine whether 
the resulting protein has a longer serum half-life as compared to the unmodified protein by praciting any 
methods available in the art or exemplified herein. 
[00106] The subject URP can be of any length necessary to effect (a) extension of serum half-life of a protein 
comprising the URP; (b) an increase in solubility of the resulting protein; (c) an increased resistance to 
protease; and/or (d) a reduced immunogenic ity of the resulting protein that comprises the URP. Typically, 
the subject URP has about 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 300, 400 or more contiguous amino 
acids. When incorporated into a protein, the URP can be fragmented such that the resulting protein 
contains multiple URPs, or multiple fragments of URPs. Some or all of these individual URP sequences 
may be shorter that 40 amino acids as long as the combined length of all URP sequences in the resulting 
protein is at least 40 amino acids. Preferably, the resulting protein has a combined length of URP 
sequences exceeding 40, 50, 60, 70, 80, 90, 100, 150, 200 or more amino acids. 
[00107] URPs may have an isoelectric point (pi) of 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 

8,0, 8,5, 9.0, 9.5, 10.0, 10.5, 11.0, 11.5, 12.0, 12.5 or even 13.0. 
[00108] In general, URP sequences are rich in hydrophilic amino acids and contain a low percentage of 

hydrophobic or aromatic arnino acids, Suitable hydrophilic residues include but are not limited to glycine, 
serine, aspartate, glutamate, lysine, arginine, and threonine. Hydrophobic residues that are less favored in 
construction of URPs include tryptophan, phenylalanine, tyrosine, leucine, isoleucine, valine, and 
methionine. URP sequences can be rich in glycine but URP sequences can also be rich in the amino acids 
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WO 2007/103515 



PCT/US2007/005952 



glutamate, aspartate, serine, threonine, alanine or proline. Thus the predominant amino acid may be G, E, 
D, S, T, A or P. The inclusion of proline residues tends to reduce sensitivity to proteolytic degradation. 

[00109] The inclusion of hydrophilic residues typically increases URPs* solubility in water and aqueous media 
under physiological conditions. As a result of their amino acid composition, URP sequences have a low 
tendency to form aggregates in aqueous formulations and the fusion of URP sequences to other proteins or 
peptides tends to enhance their solubility and reduce their tendency to form aggregates, which is a separate 
mechanism to reduce immunogenicity. 

[00110] URP sequences can be designed to avoid certain amino acids that confer undesirable properties to the 

protein. For instance, one can design URP sequences to contain few or none of the following amino acids: 
cysteine (to avoid disulfide formation and oxidation), methionine (to avoid oxidation), asparagine and 
glutarnine (to avoid desamidation). 

Glycine-rich URPs: 

[00111] In one embodiment, the subject URP comprises a glycine rich sequence (GRS). For example, glycine can 
be present predominantly such that it is the most prevalent residues present in the sequence of interest. In 
another example, URP sequences can be designed such that glycine resiudes constitute at least about 30%, 
35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100% of the total amino acids. 
URPs can also contain 100% glycines. In yet another example, the URPs contain at least 30% glycine and 
the total concentration of tryptophan, phenylalanine, tyrosine, valine, leucine, and isoleucine is less then 
20%. In still another exmaple, the URPs contain at least 40% glycine and the total concentration of 
tryptophan, phenylalanine, tyrosine, valine, leucine, and isoleucine is less then 10%. In still yet another 
exmaple, the URPs contain at least about 50% glycine and the total concentration of tryptophan, 
phenylalanine, tyrosine, valine, leucine, and isoleucine is less then 5%. 

[00112] The length of GRS can vary between about 5 amino acids and 200 amino acids or more. For example, the 
length of a single, contiguous GRS can contain 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 70, 80, 90,100, 
120, 140, 160, 180, 200, 240, 280, 320 or 400 or more amino acids. GRS may comprise glycine residues at 
both ends. 

[00113] GRS can also have a significant content of other amino acids, for example Ser, Thr, Ala, or Pro. GRS can 
contain a significant fraction of negatively charged amino acids including but not limited to Asp and Glu. 
GRS can contain a significant fraction of positively charged amino acids including but not limited to Arg or 
Lys. Where desired, URPs can be designed to contain only a single type of amino acid (i.e., Gly or Glu), 
sometimes only a few types of amino acid, e.g., two to five types of amino acids (e.g., selected from G, E, 
D, S, T, A and P), in contrast to typical proteins and typical linkers which generally are composed of most 
of the twenty types of amino acids. URPs may contain negatively charged residues (Asp, Glu) in 30, 25, 
20, 15, 12, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 percent of the amino acids positions. 

[00114] Typically, the subject GRS-containing URP has about 30, 40, 50, 60, 70, 80, 90, 100, or more contiguous 
amino acids. When incorporated into a protein, the URP can be fragmented such that the resulting protein 
contains multiple URPs, or multiple fragments of URPs. Some or ail of these individual URP sequences 
may be shorter that 40 amino acids as long as the combined length of all URP sequences in the resulting 
protein is at least 30 amino acids. Preferably, the resulting protein has a combined length of URP 
sequences exceeding 40, 50, 60, 70, 80, 90, 100, or more amino acids. 
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[00115] The GRS-containing URPs are of particular interest due to, in part, the increased conformational freedom 
of glycine-containing peptides. Denatured peptides in solution have a high degree of conformational 
freedom Most of that conformational freedom is lost upon binding of said peptides to a target like a 
receptor, an antibody, or a protease. This loss of entropy needs to be offset by the energy of interaction 
5 between the peptide and its target The degree of conformational freedom of a denatured peptide is 

dependent on its amino acid sequences. Peptides containing many amino acids with small side chains tend 
to have more conformational freedom than peptides that are composed of amino acids with larger side 
chains. Peptides containing the amino acid glycine have particularly large degrees of freedom It has been 
estimated that glycine-containing peptide bonds have about 3.4 times more entropy in solution as compared 

10 to corresponding alanine-containing sequences (D'Aquino, J. A., et al. (1996) Proteins, 25: 143-56). This 

factor increases with the number of glycine residues in a sequence. As a result, such peptides tend to lose, 
more entropy upon binding to targets, which reduces their overall ability to interact with other proteins as 
well as their ability to adopt defined three-dimensional structures. The large conformational flexibility of 
glycine-peptide bonds is also evident when analyzing Ramachandran plots of protein structures where 

*5 glycine peptide bonds occupy areas that are rarely occupied by other peptide bonds (Venkatachalam, C. M., 

et al. (1969) Annu Rev Biochem, 38: 45-82). Stites et al. studied a database of 12,320 residues from 61 
nonhomologous, high resolution crystal structures to determine the phi, psi conformational preferences of 
each of the 20 amino acids. The observed distributions in the native state of proteins are assumed to also 
reflect the distributions found in the denatured state. The distributions were used to approximate the energy 

20 surface for each residue, allowing the calculation of relative conformational entropies for each residue 

relative to glycine. In the most extreme case, replacement of glycine by proline, conformational entropy 
changes will stabilize the native state relative to the denatured state by -0.82 +/- 0.08 kcal/mol at 20° C 
(Stites, W. E., et al. (1995) Proteins, 22: 132). These observations confirm the special role of glycine 
among the 20 natural amino acids. 

25 [00116] In designing the subject URPs, natural or non-natural sequences can be used. For example, a host of 

natural sequences containing high glycine content is provided in Table 1, Table 2, Table 3, and Table 4. 
One skilled in the art may adopt any one of the sequences as an URP, or modify the sequences to achieve 
the intended properties. Where irnmunogenicity to the host subject is of concern, it is preferable to design 
GRS-containing URRs based on glycine rich sequences derived from the host. Preferred GRS-containing 

30 URPs are sequences from human proteins or sequences that share substantial homology to the 

corresponidng glycine rich sequences in the reference human proteins. 

[00117] 



Table 1. Structural analysis of proteins that contain glycine rich sequences 



PDB file 


Protein function 


Glycine rich sequences 


1K3V 


Porcine Parvovirus capsid 


sgggggggggrgagg 


1FPV 


Feline Panleukopenia Vims 


tgsgngsgggggggsgg 


IDS 


CpV strain D, mutant A300d 


tgsgngsgggggggsgg 


1MVM 


Mvm (strain I) virus 


ggsggggsgggg 



35 



Table 2: Open reading frames encoding GRS with 300 or more glycine residues 

» 
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Accession 


Organism 


Gly (%) 


GRS 
length 


Gene 
length 


Predicted Function 


NP.974499 


Arabidopsis tha liana 


64 


509 


579 


unknown 


ZPJ)0458077 


Burkholderia cenocopacia 


66 


373 


518 


putative lipoprotein 


XP_477841 


Oryza sativa 


74 


371 


422 


unknown 


NP_910409 


Oryza sativa 


75 

• 


368 


400 


putative cell-wall 
precursor 


NP_610660 


Drosophila melanogaster 


66 


322 


610 


transposable element 



Table 3* Examples of human GRS 
GRS Gene Hydroph 



Accession 


Gly (%) 


length 


length 


obics 


Predicted Function 


NP_0002 1 7 


62 


135 


622 


yes 


keratin 9 


NPJ53 1 961 


61 


73 


592 


yes 


TBP-associated factor 15 isofbrm 1 


NP476429 


65 


70 


629 


yes 


keratin 3 


NP_J)00418 


70 


66 


316 


yes 


loricrin, cell envelope 


NP_056932 


60 


66 


638 


yes 


cytokeratin 2 



Accession 

NP_006228. 

NPJ787059 

NP_009060 

NP_031393 

NP_005850 

NP_061856 

NPJ787059 

NP__009060 

NP_031393 

NP_1 15818 

XP_376532 

NP 065104 



Table 4. Additional examples of human GRS 

Sequences Number of amino acids 

GPGGGGGPGGGGGPGGGGPGGGGGGGPGGGGGGPGGG 37 

GAGGGGGGGGGGGGGSGGGGGGGGAGAGGAGAG 33 

GGGSGSGGAGGGSGGGSGSGGGGGGAGGGGGG 32 

GDGGGAGGGGGGGGSGGGGSGGGGGGG 27 

GSGSGSGGGGGGGGGGGGSGGGGGG 25 

GGGRGGRGGGRGGGGRGGGRGGG 22 

GAGGGGGGGGGGGGGSGGGGGGGGAGAGGAGAG 33 

GGGSGSGGAGGGSGGGSGSGGGGGGAGGGGGG 32 

GDGGGAGGGGGGGGSGGGGSGGGGGGG 27 

GSGGSGGSGGGPGPGPGGGGG 2 1 

GEGGGGGGEGGGAGGGSG 1 8 

GGGGGGGGDGGG 12 



GGGSGSGGAGGGSGGGSGS GGGGGGAGGGGGG SSGGGSGTAGGHSO 



POU domain, class 4, transcription factor 1 [Homo sapiens] 
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GPGGGGGPGGGGGPGGGGPGGGGGGGPGGGGnOPnna 



YEATS domain containing 2 [Homo sapiens] 
GGSGA GGGGGGGGGGG SGSGGGGS TGGGGGT AGGG 

5 

AT rich interactive domain IB (SWIl-like) isoform 3; BRGl-binding protein ELD/OSA1; Eld (eyelid)/Osa protein 
[Homo sapiens] 

GA GGGGGGGGGGGGGSGGGGGGGGA GAGGAGAG 

10 AT rich interactive domain IB (SWIl-like) isoform 2; BRGl-binding protein ELD/OSA1; Eld (eyelid)/Osa protein 
[Homo sapiens] 

GA GGGGGGGGGGGGG S GGGGGGGGA GAGGAOAG 

AT rich interactive domain IB (SWIl-like) isoform 1; BRGl-binding protein ELD/OSA1; Eld (eyelid)/Osa protein 
[Homo sapiens] 

G AGGGGGGGGGGGGG S GGGGGGGGA GAGGAfiAG 

purine-rich element binding protein A; piirine-rich single-stranded DNA-binding protein alpha; transcriptional 
activator protein PUR-alpha [Homo sapiens] 
GHPGSGSGSGGGGGGGGGGGGSGGGGGGAPGG 



regulatory factor XI; trans-acting regulatory factor 1; enhancer factor C; MHC class II regulatory factor RFX 
[Homo sapiens] 

GGGG S GGGGGGGGGGGGGG SGSTGG^snAO 

bromo domain-containing protein disrupted in leukemia [Homo sapiens 
GGRG RGGRGRGS RGRGGGGTRGRGRGRGGRG 



unknown protein [Homo sapiens] 
30 GSGGSGGSGGGPGPGPGGGGGPSGSGSGPG 

PREDICTED: hypothetical protein XP 059256 [Homo sapiens] 
GGGGGGGGGGGR GGGGKGttTZaattvaart 



zinc finger protein 28 1 ; ZNP-99 transcription factor [Homo sapiens] 
GGGGTGSSGGSGSGGGGSGGGGGGGSSG 



RNA binding protein (autoantigenic, hnRNP-associated with lethal yellow) short isoform; RNA-binding protein 
(autoantigenic); RNA-binding protein (autoantigenic, hnRNP-associated with lethal yellow) [Homo sapiens] 
40 GPGGGAGGGGGGGGSGGGGSGGGGGGG 
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signal recognition particle 68kDa [Homo sapiens] 
GGGGGGG S GGGGG S GGGG S GGG HGAGG 

KIAA0265 protein [Homo sapiens] 
5 GGGA AGA GGGG SG AGGG S GG SGGRGTG 

engrailed homolog 2; Engrailed-2 [Homo sapiens 
G AGGGRGGGA GGEGGASGAE GGGGA GG 

10 RNA binding protein (autoantigenic, hnRNP-associated with lethal yellow) long isoform; RNA-binding protein 
(autoantigenic); RNA-binding protein (autoantigenic, hnRNP-associated with lethal yellow) [Homo sapiens] 
GDGGG AGGGGGGGG S GGGG S GGGGGGG 

androgen receptor; dihydrotestosterone receptor [Homo sapiens] 
15 GGGGGGGGGGGGGGGGGGGGGGG EAG 

homeo box Dl 1 ; homeo box 4F; Hox-4.6, mouse, homolog of; homeobox protein Hox-Dl 1 [Homo sapiens] 
GGGGGGSAGGGSSGGGPGGGGGGAGG 



20 frizzled 8; frizzled (Drosophila) homolog 8 [Homo sapiens] 
GGGGGPGGGGGGGPGGGGGPGGGGG 



ocular development-associated gene [Homo sapiens] 
GRGGAGSGGAGSGAAGGTGSSGGGG 

25 

homeo box B3; homeo box 2G; homeobox protein Hox-B3 [Homo sapiens] 
GGGGGGGGGGG S GG S GGGGGGGGGG 

chromosome 2 open reading frame 29 [Homo sapiens] 
30 GGS GGGR GGASGPGSGSGGPGGPAG 

DKFZP564F0522 protein [Homo sapiens] 
GGHHGDRGGGRGGRGGRGGRGGRAG 

35 PREDICTED: similar to Homeobox even-skipped homolog protein 2 (EVX-2) [Homo sapiens 
GS RGGGGGGGGGGGGGGGGA GAGGO 

ras homolog gene family, member U; Ryu GTPase; Wnt-1 responsive Cdc42 homolog; 2310026M05Rik; GTP- 
binding protein like 1 ; CDC42-like GTPase [Homo sapiens] 
40 GGRGGRGPGEPGGRGRAGGAEGRG 
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scratch 2 protein; transcriptional repressor scratch 2; scratch (drosophila homolog) 2, zinc finger protein [Homo 
sapiens] 

GGGGG PAGGSGDAGGAGGRAGRAG 

5 nucleolar protein family A, member 1 ; GAR1 protein [Homo sapiens] 
GGGRGGRGGGRGGGGRGGGRGGG 



keratin 1; Keratin- 1 ; cytokeratin 1; hair alpha protein [Homo sapiens] 
GG S GGGGGG SSGGRGSGGGSSGG 

10 

hypothetical protein FLJ3 1413 [Homo sapiens] 
GSGPGTGGGGSGSGGGGGGSGGG 

one cut domain, family member 2; onecut 2 [Homo sapiens] 
15 GAR GGG S GGGGGGGGGGGGGGP G 

POU domain, class 3, transcription factor 2 [Homo sapiens] 
GGGGGGGGGGGGGGGGGGGGGDG 

20 PREDICTED: similar to THO complex subunit 4 (Tho4) (RKA and export factor binding protein 1) (REF1-I) (Ally 
of AML-1 and LEF-1) (Aly/REF) [Homo sapiens] 
GGTRGGTRGGTRGGDRGRGRGAG 

PREDICTED: similar to THO complex subunit 4 (Tho4) (RNA and export factor binding protein 1) (REF1-I) (Ally 
25 of A ML- 1 and LEF-1) (Aly/REF) [Homo sapiens] 
GGTRGGTRGGTRGGDRGRGRGAG 

POU domain, class 3 7 transcription factor 3 [Homo sapiens] 
GAGGGGGGGGGGGGGGAGGGGGG 

30 

nucleolar protein family A, member 1 ; GAR1 protein [Homo sapiens] 
GGGRGGRGGGRGGGGRGGGRGGG 



fibrillarin; 34-kD nucleolar scleroderma antigen; RNA, U3 small nucleolar interacting protein 1 [Homo sapiens] 
35 GRGRGGGGGGGGGGGGGRGGGG 

zinc finger protein 579 [Homo sapiens] 
GRGRGRGRGRGRGRGRGRGGAG 

40 calpain, small subunit 1; calcium-activated neutral proteinase; calpain, small polypeptide; calpain 4, small subunit 
(30K); calcium-dependent protease, small subunit [Homo sapiens] 
GAGGGGGGGGGGGGGGGGGGGG 
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keratin 9 [Homo sapiens] 
GGGSGGGHSGGSGGGHSGGSGG 



forkhead box Dl; forkhead-related activator 4; Forkhead, drosophila, homolog-like 8; foikhead (Drosophila)~like 8 
[Homo sapiens] 

GAGA GGGGGGGGAGGGG SAGSG 

PREDICTED: similar to RJKEN cDNA C230094B15 [Homo sapiens] 
GGPGTGS GGGGA G TGGGA GGPG 

GGGGGGGGGAGGAGGAGSAGGG 

cadherin 22 precursor; ortholog of rat PB-cadherin [Homo sapiens] 
GGD GGG SA GGGAGGG SGGGAG 

AT-binding transcription factor 1; AT motif-binding factor 1 [Homo sapiens] 
GGGGGGSGGGGGGGGGGGGGG 

eomesodermin; t box, brain, 2; eomesoderrnin (Xenopus laevis) homolog [Homo sapiens] 
GPGAGAGSGAGGSSGGGGGPG 

phosphatidylinositol transfer protein, membrane-associated 2; PYK2 N-teiroinal domain-interacting receptor 3; 
retinal degeneration B alpha 2 (Drosophila) [Homo sapiens] 
GGGGGGGGGGGSSGGGGS SGG 



sperm associated antigen 8 isoform 2; sperm membrane protein 1 [Homo sapiens] 
GSGSGPGPGSGPGSGPGHGSG 

PREDICTED: RNA binding motif protein 27 [Homo sapiens] 
GPGPGPGPGPGPGPGPGPGPG 

API gamma subunit binding protein 1 isoform 1; gamma-synergin; adaptor-related protein complex 1 gamma 

subunit-binding protein 1 [Homo sapiens] 

GAGS GGGGA AGAGAGSA GGGG 

API gamma subunit binding protein 1 isoform 2; gamma-synergin; adaptor-related protein complex 1 gamma 

subunit-binding protein 1 [Homo sapiens] 

GAGSGGGGAAGAGAGSAGGGG 

ankyrin repeat and sterile alpha motif domain containing 1 ; ankyrin repeat and SAM domain containing 1 [Homo 
sapiens] 
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GGGGGGG S GGGGGG S GGGGGG 

methyl-CpG binding domain protein 2 isoform 1 [Homo sapiens] 
GRGRGRGRGRGRGRGRGRGRG 

triple functional domain (PTPRF interacting) [Homo sapiens] 
GGGGGGGSGGSGGGGGSGGGG 



forkhead box D3 [Homo sapiens 
GGEEGGASGGGPGAGSGSAGG 

sperm associated antigen 8 isoform 1 ; sperm membrane protein 1 [Homo sapiens] 
GSGSGPGPGSGPGSGPGHGSG 

methyl-CpG binding domain protein 2 testis-specific isoform [Homo sapiens] 
GRGRGRGRGRGRGRGRGRGRG 

cell death regulator aven; programmed cell death 12 [Homo sapiens] 
GGGGGGGG D GGGR RGRGRGRG 

regulator of nonsense transcripts 1; delta helicase; up-frameshift mutation 1 homolog (S- cerevisiae); nonsense 

mRNA reducing factor 1; yeast Upflp homolog [Homo sapiens] 

GGP GG^ P GGGGAGGP GGAGAG 

small conductance calcium-activated potassium channel protein 2 isoform a; apamin-sensitive small-conductance 

Ca2+-activated potassium channel [Homo sapiens] 

G TGGGG S TGGGGGGGG SGHG 

SRY (sex determining region Y>box 1 ; SRY-reiated HMG-box gene 1 [Homo sapiens] 
GPAGAGGGGGGGGGGGGGGG 

transcription factor 20 isoform 2; stromelysin-1 platelet-derived growth factor-responsive element binding protein; 
stromelysin 1 PDGF-responsive element-binding protein; SPRE-binding protein; nuclear factor SPBP [Homo 
sapiens] 

GGTGGSSGSSGSGSGGGRRG 

transcription factor 20 isoform 1; stromelysin-1 platelet-derived growth factor-responsive element binding protein; 
stromelysin 1 PDGF-responsive element-binding protein; SPRE-binding protein; nuclear factor SPBP [Homo 
sapiens] 

GGTGGSSGSSGSGSGGGRRG 

Ras-interacting protein 1 [Homo sapiens] 
GSGTGTTG S S G A GGPGTPGG 
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BMP-2 inducible kinase isoform b [Homo sapiens] 
GGS GGG AA GGG AGGAGAGAG 

BMP-2 inducible kinase isoform a [Homo sapiens] 
GGSGGGAAGGGAGGAGAGAG 

forkhead box CI; forkhead-related activator 3; Forkhead, drosophila> homolog-like 7; forkhead (Drosophila)-Iike 7; 

iridogoniodysgenesis type 1 [Homo sapiens] 

GSS GGGGGGA GAAGGAGGAG 

splicing factor p54; arginine-rich 54 kDa nuclear protein [Homo sapiens] 
GPGPS GG P GGGGGGGGGGGG 

v-maf musculoaponeurotic fibrosarcoma oncogene homolog; Avian musculoaponeurotic fibrosarcoma (MAF) 
protooncogene; v-maf musculoaponeurotic fibrosarcoma (avian) oncogene homolog [Homo sapiens] 
GGGGGGGGGGGGGGA AGAfiG 

small nuclear ribonucleoprotein D 1 polypeptide 1 6kDa; snRNP core protein Dl ; Sm-D autoantigen; small nuclear 

ribonucleoprotein Dl polypeptide (16kD) [Homo sapiens] 

GRGRGRGRGRGRGRGRGRGG 

hypothetical protein H4 1 [Homo sapiens] 
GSAGGS SGAAGAAGGGAGAG 

URPs containing non-glycine residues (NGR): 
[00118] The sequences of non-glycine residues in these GRS can be selected to optimize the properties of URPs and 
hence the proteins that contain the desired URPs. For instance, one can optimize the sequences of URPs to 
enhance the selectivity of the resulting protein for a particular tissue, specific cell type or cell lineage. For 
example, one can incorporate protein sequences that are not ubiquitously expressed, but rather are 
differentially expressed in one or more of the body tissues including heart, liver, prostate, lung, kidney, 
bone marrow, blood, skin, bladder, brain, muscles, nerves, and selected tissues that are affected by diseases 
such as infectious diseases, autoimmune disease, renal, neronal, cardiac disorders and cancers. One can 
employ sequences representative of a specific developmental origin, such as those expressed in an embryo 
or an adult, during ectoderm, endoderm or mesoderm formation in a multi-cellular organism. One can also 
utilize sequence involved in a specific biological process, including but not limited to cell cycle regulation, 
cell differentiation, apoptosis, chemotaxsis, cell motility and cytoskeletal rearrangement One can also 
utilize other non-ubiquitously expressed protein sequences to direct the resulting protein to a specific 
subcellular locations; extracellular matrix, nucleus, cytoplasm, cytoskeleton, plasma and/or intracellular 
membranous structures which include but are not limited to coated pits, Golgi apparatus, endoplasmic 
reticulum, endosome, lysosome, and mitochondria. 

# 
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100119] A variety of these tissue-specific, cell-type specific, subcellular location specific sequences are known and 
available from numerous protein databases. Such selective URP sequences can be obtained by generating 
libraries of random or semi-random URP sequences, injecting them into animals or patients, and 
deterrnining sequences with the desired tissue selectivity in tissue samples. Sequence determination can be 
performed by mass spectrometry. Using similar methods one can select URP sequences that facilitate oral, 
buccal, intestinal, nasal, thecal, peritoneal, pulmonary, rectal, or dermal uptake. 

[00120] Of particular interest are URP sequences that contain regions that are relatively rich in the positively 

charged amino acids arginine or lysine which favor cellular uptake or transport through membranes. URP 
sequences can be designed to contain one or several protease-sensitive sequences. Such URP sequences 
can be cleaved once the product of the invention has reached its target location. This cleavage may trigger 
an increase in potency of the pharmaceutically active domain (pro-drug activation) or it may enhance 
binding of the cleavage product to a receptor. URP sequences can be designed to carry excess negative 
charges by introducing aspartic acid or glutamic acid residues. Of particular interest are URP that contain 
great than 5%, greater than 6%, 7%, 8%, 9%, 10%, 15%, 30% or more glutamic acid and less than 2% 
lysine or arginine. Such URPs carry an excess negative charge and as a result they have a tendency to 
adopt open conformations due to electrostatic repulsion between individual negative charges of the peptide. 
Such an excess negative charge leads to an effective increase in their hydrodynamic radius and as a result it 
can lead to reduced kidney clearance of such molecules. Thus, one can modulate the effective net charge 
and hydrodynamic radius of a URP sequence by controlling the frequency and distribution of negatively 
charged amino acids in the URP sequences. Most tissues and surfaces in a human or animal carry excess 
negative charges. By designing URP sequences to carry excess negative charges one can minimize non- 
specific interactions between the resulting protein comprising the URP and various surfaces such as blood 
vessels, healthy tissues, or various receptors. 

[00121] URPs may have a repetitive amino acid sequence of the format (Motif) x in which a sequence motif forms a 
direct repeat (ie AB CAB CAB CAB C) or an inverted repeat (ABCCBAABCCBA) and the number of these 
repeats can be 2,3,4,5,6,7,8,9,10,12,14,16,18,20,22,24,26,28,30, 35,40, 50 or more. URPs or the repeats 
inside URPs often contain only 1,2,3,4,5 or 6 different types of amino acids. URPs typically consist of 
repeats of human amino acid sequences that are 

4,5,6,7,8,9,10,1 1,12,13,14,15,16,17,18,19,20,22,24,26,28,30,32,34,36 or more amino acids long, but URPs 
may also consist of non-human amino acid sequences that are 20,22,24,26,28,30,32, 34 36, 38 40, 42, 44, 
46, 48, 50 amino acids long. 

URPs derived from human sequences: 
[00122] URPs can be derived from human sequences. The human genome contains many subsequences that are 
rich in one particular amino acid. Of particular interest are such amino acid sequences that are rich in a 
hydrophilic amino acid like serine, threonine, glutamate, aspartate, or glycine. Of particular interest are 
such subsequences that contain few hydrophobic amino acids. Such subsequences are predicted to be 
unstructured and highly soluable in aqeuous solution. Such human subsequences can be modified to 
further improve their utility. Figure 1 7 shows an exemplary human sequence that is rich in serine and that 
can be isolated as the subject URP. The exemplified dentin sialophosphoprotein contains a 670-amino acid 
subsequence in which 64% of the residues are serine and most other positions are hydrophilic amino acids 
such as aspartate, asparagines, and glutamate. The sequence is extremely repetitive and as a result it has a 
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low information content. One can directly use subsequences of such a human protein. Where desired, one 
can modify the sequence in a way that preserves its overall character but which makes it more suitable for 
pharmaceutical applications. Examples of sequences that are related to dentin sialophosphoprotein are 
(SSD) n , (SSDSSN) n , (SSE) n , where n is between about 4 and 200. 

[00123) The use of sequences from human proteins is particularly desirable in design of URPs with reduced 
immunogenicity in a human subject. A key step for eliciting an immune response to a foreign protein is the 
presentation of peptide fragments of said protein by MHC class II receptors. These MHCII-bound 
fragments can then be detected by T cell receptors, which triggers the proliferation of T helper cells and 
initiates an immune response. The elimination of T cell epitopes from pharmaceutical proteins has been 
recognized as a means to reduce the risk of eliciting an immune reaction (Stickler, M., et al. (2003) J 
Immunol Methods, 281: 95-108). MHCII receptors typically interact with an epitope having e.g., a 9- 
amino acid long region of the displayed peptides. Thus, one can reduce the risk of eliciting an immune 
response to a protein in patients if all or most of the possible 9mer subsequences of the protein can be 
found in human proteins and if so, these sequences and repeats of these sequences will not be recognized 
by the patient as foreign sequences. One can incorporate human sequences into the design of URP 
sequences by oligomerizing or concatenating human sequences that have suitable amino acid compositions. 
These can be direct repeats or inverted repeats or mixtures of different repeats. For instance one can 
oligomerize the sequences shown in table 2. Such oligomers have reduced risk of being immunogenic. 
However, the junction sequences between the monomer units can still contain T cell epitopes that can 
trigger an immune reaction, which is illustrated in figure 3. One can further reduce the risk of eliciting an 
immune response by designing URP sequences based on multiple overlapping human sequences. This 
approach is illustrated in figure 4. The URP sequence in figure 2 designed as an oligomer based on 
multiple human sequences such that each 9mer subsequences of the oligomer can be found in a human 
protein. In these designs, every 9-mer subsequence is a human sequence. An example of a URP sequence 
based on three human sequences is shown in figure 5. It is also possible to design URP sequences based on 
a single human sequences such that all possible 9mer subsequences in the oligomeric URP sequences occur 
in the same human protein. An example is shown in figure 6 based on the POU domain that is rich in 
glycine and proline. The repeating monomer in the URP sequence is only a fragment of the human protein 
and its flanking sequences is identical to the repeating unit as illustrated in figure 6. Non-oligomeric URP 
sequences cart be designed based on human proteins as well. The primary conditions are that all 9mer sub- 
sequences can be found in human sequences. The amino acid composition of the sequences preferably 
contains few hydrophobic residues. Of particular interest are URP sequences that are designed based on 
human sequences and that contain a large fraction of glycine residues. 

(00124) Utlizing this or similar scheme, one can design a class of URPs that comprise repeat sequences with low 
immunogenicity to the host of interest. Host of interest can be any animals, including vertebrates and 
invertebrates. Preferred hosts are mamamals such as primates (e.g. chimpanzees and humans), cetaceans 
(e.g. whales and dolphins), chiropterans (e.g. bats), perrisodactyls (e.g. horses and rhinoceroses), rodents 
(e.g. rats), and certain kinds of insectivores such as shrews, moles and hedgehogs. Where human is 
selected as the host, the URPs typically contain multiple copies of the repeat sequences or units, wherein 
the majority of segments comprising about 6 to about 15 contiguous amino acids are present in one or more 
native human proteins. One can also design URPs in which the majority of segments comprising between 
about 9 to about 1 5 contiguous amino acids are found in one or more native human proteins. As used 
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herein, majority of the segments refers to more than about 50%, preferably 60%, preferably 70%, 
preferably 80%, preferably 90%, preferably 100%. Where desired, each of the possible segments between 
about 6 to 15 amino acids, preferably between about 9 to 15 amino acids within the repeating units are 
present in one or more native human proteins. The URPs can comprise multiple repeating units or 
sequences, for example having 2, 3 9 4, 5, 6 9 7, 8, 9' 9 10, or more repeating units. 

Design of URPs that are substantially free of human T-cell epitopes: 
[00125] URP sequences can be designed to be substantially free of epitopes recorgnized by human T cells. For 
instance, one can synthesize a series of semi-random sequences with amino acid compositions that favor 
denatured, unstructured coriforrnations and evaluate these sequences for the presence of human T cell 
epitopes and whether they are human sequences. Assays for human T cell epitopes have been described 
(Stickler, M.„ et al. (2003) J Immunol Methods, 281: 95-108). Of particular interest are peptide sequences 
that can be oligomerized without generating T cell epitopes or non-human sequences. This can be achieved 
by testing direct repeats of these sequences for the presence of T-cell epitopes and for the occurrence of 6 
to 15-mer and in particular 9-mer subsequences that are not human. An alternative is to evaluate multiple 
peptide sequences that can be assembled into repeating units as described in the previous section for the 
assembly of human sequences. Another alternative is to design URP sequences that result in low scores 
using epitope prediction algorithms like TEPITOPE (Sturniolo, T., et al. (1999) Nat Biotechnol, 17: 555- 
61). Another approach to avoiding T-cell epitopes is to avoid amino acids that can serve as anchor residues 
during peptide display on MHC, such as M, I, L, V, F. Hydrophobic amino acids and positively charged 
amino acids can frequently serve as such anchor residues and minimizing their frequency in a URP 
sequences reduces the chance of generating T-cell epitopes and thus eliciting an immune reaction. The 
selected URPs generally contain subsequences that are found in at least one human protein, and have a 
lower content of hydrophobic amino acids. 

[00126] URP sequences can be designed to optimize protein production. This can be achieved by avoiding or 
rninimizing repetitiveness of the encoding DNA. URP sequences such as poly-glycine may have very 
desirable pharmaceutical properties but their manufacturing can be difficult due to the high GC-content of 
DNA sequences encoding for GRS and due to the presence of repeating DNA sequences that can lead to 
recombination. 

[001271 As noted above, URP sequences can be designed to be highly repetitive at the amino acid level. As a result 
the URP sequences have very low information content and the risk of eliciting an immune reaction can be 
reduced. 

[00128] Non-limiting examples of URPs containing repeating amino acids are: poly-glycine, poly-glutamic acid, 
poly-aspartic acid, poly-serine, poly-threonine, (GX) n where G is glycine and X is serine, aspartic acid, 
glutamic acid, threonine, or proline and n is at least 20, (GGX) n where X is serine, aspartic acid, glutamic 
acid, threonine, or proline and n is at least 13, (GGGX) n where X is serine, aspartic acid, glutamic acid, 
threonine, or proline and n is at least 10, (GGGGX) 0 where X is serine, aspartic acid, glutamic acid, 
threonine, or proline and n is at least 8, (G 2 X)„ where X is serine, aspartic acid, glutamic acid, threonine, or 
proline, n is at least 15, and z is between 1 and 20. 

[00129] The number of these repeats can be any number between 10 and 100. Products of the invention may contain 
URP sequences that are semi-random sequences. Examples are semi-random sequences containing at least 
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30, 40, 50, 60 or 70% glycine in which the glycines are well dispersed and in which the total concentration 
of tryptophan, phenylalanine, tyrosine, valine, leucine, and isoleucine is less then 70, 60, 50, 40, 30, 20, or 
10% when combined. A preferred semi-random URP sequence contains at least 40% glycine and the total 
concentration of tryptophan, phenylalanine, tyrosine, valine, leucine, and isoleucine is less then 10%. A 
more preferred random URP sequence contains at least 50% glycine and the total concentration of 
tryptophan, phenylalanine, tyrosine, valine, leucine, and isoleucine is less then 5%. URP sequences can be 
designed by combining the sequences of two or more shorter URP sequences or fragments of URP 
sequences. Such a combination allows one to better modulate the pharmaceutical properties of the product 
containing the URP sequences and it allows one to reduce.rhe repetitiveness of the DNA sequences 
encoding the URP sequences, which can improve expression and reduce recombination of the URP 
encoding sequences. 

[00130] URP sequences can be designed and selected to possess several of the following desired properties: a) high 
genetic stability of the coding sequences in the production host, b) high level of expression, c) low 
(predicted/calculated) immunogenicity, d) high stability in presence of serum proteases and/or other tissue 
proteases, e) large hydrodynamic radius under physiological conditions. One exemplary approach to obtain 
URP sequences that meet multiple criteria is to construct a library of candidate sequences and to identify 
from the library the suitable subsequences. Libraries can comprise random and/or semi-random sequences. 
Of particular utility are codon libraries, which is a library of DNA molecules that contains multiple codons 
for the identical amino acid residue. Codon randomization can be applied to selected amino acid positions 
of a certain type or to most or all positions. True codon libraries encode only a single arnino acid sequence, 
but they can easily be combined with amino acid libraries, which is a population of DNA molecules 
encoding a mixture of (related or unrelated) amino acids at the same residue position. Codon libraries allow 
the identification of genes that have relatively low repetitiveness at the DNA level but that encode highly 
repetitive amino acid sequences. This is useful because repetitive DNA sequences tend to recombine, 
leading to instability. One can also construct codon libraries that encode limited amino acid diversity. Such 
libraries allow introduction of a limited number of amino acids in some positions of the sequence while 
other positions allow for codon variation but all codons encode the same amino acid. One can synthesize 
partially random oligonucleotides by incorporating mixtures of nucleotides at the same position during 
oligonucleotide synthesis. Such partially random oligonucleotides can be fused by overlap PCR or 
ligation-based approaches. In particular, one can multimerize semi-random oligonucleotides that encode 
glycine-rich sequences. These oligonucleotides can differ in length and sequences and codon usage. As a 
result, one obtains a library of candidate URP sequences. Another method to generate libraries is to 
synthesize a starting sequence and subsequently subject said sequence to partial randomization. This can 
be done by cultivation of the gene encoding the URP sequences in a mutator strain or by amplification of 
the encoding gene under mutagenic conditions (Leung, D., et al. (1989) Technique, 1: 1 1-15). URP 
sequences with desirable properties can be identified from libraries using a variety of methods. Sequences 
that have a high degree of genetic stability can be enriched by cultivating the library in a production host. 
Sequences that are unstable will accumulate mutations, which can be identified by DNA sequencing. 
Variants of URP sequences that can be expressed at high level can be identified by screening or selection 
using multiple protocols known to someone skilled in the art. For instance one can cultivate multiple 
isolates from a library and compare expression levels. Expression levels can be measured by gel analysis, 
analytical chromatography, or various ELISA-based methods. The determination of expression levels of 

* 
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individual sequence variants can be facilitated by fusing the library of candidate URP sequences to 
sequence tags like myc-tag, His-tag, HA-tag. Another approach is to fuse the library to an enzyme or other 
reporter protein like green fluorescent protein. Of particular interest is the fusion of the library to a 
selectable marker like beta-lactamase or kanamycin-acyl transferase. One can use antibiotic selection to 
5 enrich for variants with high level of expression and good genetic stability. Variants with good protease 

resistance can be identified by screening for intact sequences after incubation with proteases. An effective 
way to identify protease-resistant URP sequences is bacterial phage display or related display methods. 
Multiple systems have been described where sequences that undergo rapid proteolysis can be enriched by 
phage display. These methods can be easily adopted to enrich for protease resistant sequences. For 
10 example, one can clone a library of candidate URP sequences between an affinity tag and the pill protein of 

Ml 3 phage. The library can then be exposed to proteases or protease-containing biological samples like 
blood or lysosomal preparations. Phage that contain protease-resistant sequences can be captured after 
protease treatment by binding to the affinity tag. Sequences that resist degradation by lysosomal 
preparations are of particular interest because lysosomal degradation is a key step during antigen 
15 presentation in dendritic and other antigen presenting cells. Phage display can be utilized to identify 

candidate URP sequences that do not bind to a particular immune serum in order to identify URP sequences 
with low immunogenicity. One can immunize animals with a candidate URP sequence or with a library of 
URP sequences to raise antibodies against the URP sequences in the library. The resulting serum can then 
be used for phage panning to remove or identify sequences that are recognized by antibodies in the 
20 resulting immune serum. Other methods like bacterial display, yeast display, ribosomal display can be 

utilized to identify variants of URP sequences with desirable properties. Another approach is the 
identification of URP sequences of interest by mass spectrometry. For instance, one can incubate a library 
of candidate URP sequences with a protease or biological sample of interest and identify sequences that 
resist degradation by mass spectrometry. In a similar approach one can identify URP sequences that 
25 facilitate oral uptake. One can feed a mixture of candidate URP sequences to animals or humans and 

identify variants with the highest transfer or uptake efficiency across some tissue barrier (ie dermal, etc) by 
mass spectrometry. In a similar way, one can identify URP sequences that favor other uptake mechanisms 
like pulmonary, intranasal, rectal, transdermal delivery. One can also identify URP sequences that favor 
cellular uptake or URP sequences that resist cellular uptake. 
30 [00131 ] URP sequences can be designed by combining URP sequences or fragments of URP sequences that were 

designed by any of the methods described above. In addition, one can apply semi-random approaches to 
optimize sequences that were designed based on the rules described above. Of particular interest is codon 
optimization with the goal of improving expression of the enhanced proteins and to improve the genetic 
stability of the encoding gene in the production hosts. Codon optimization is of particular importance for 
35 URP sequences that are rich in glycine or that have very repetitive amino acid sequences. Codon 

optimization can be performed using computer programs (Gustafsson, C, et al. (2004) Trends BiotechnoU 
22: 346-53), some of which minimize ribosomal pausing (Coda Genomics Inc.). When designing URP 
sequences one can consider a number of properties. One can nurnmize the repetitiveness in the encoding 
DNA sequences. In addition, one can avoid or minimize the use of codons that are rarely used by the 
40 production host (ie the AGG and AGA arginine codons and one Leucine codon in E. coli) DNA sequences 

that have a high level of glycine tend to have a high GC content that can lead to instability or low 
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expression levels. Thus, when possible it is preferred to choose codons such that the GC-content of URP- 
encoding sequence is suitable for the production organism that will be used to manufacture the URP. 
[00132] URP encoding genes can be made in one or more steps, either fully synthetically or by synthesis combined 
with enzymatic processes, such as restriction enzyme-mediated cloning, PGR and overlap extension. URP 
5 modules can be constructed such that the URP module-encoding gene has low repetitiveness while the 

encoded amino acid sequence has a high degree of repetitiveness. The approach is illustrated in figure 11. 
As a first step, one constructs a library of relatively short URP sequences. This can be a pure codon library 
such that each library member has the same amino acid sequence but many different coding sequences are 
possible. To facilitate the identification of well-expressing library members one can construct the library as 

10 fusion to a reporter protein. Examples of suitable reporter genes are green fluorescent protein, luciferace, 

alkaline phosphatase, beta-galactosidase. By screening one can identify short URP sequences that can be 
expressed in high concentration in the host organism of choice. Subsequently, one can generate a library of 
random URP dimers and repeat the screen for high level of expression. Dimerization can be performed by 
ligation, overlap extension or similar cloning techniques. This process of dimerization and subsequent 

15 screening can be repeated multiple times until the resulting URP sequence has reached the desired length. 

Optionally, one can sequence clones in the library to eliminate isolates that contain undesirable sequences. 
The initial library of short URP sequences can allow some variation in amino acid sequence. For instance 
one can randomize some codons such that a number of hydrophilic amino acids can occur in said position. 
During the process of iterative multimerization one can screen library members for other characteristics 

20 like solubility or protease resistance in addition to a screen for high-level expression. Instead of dimerizing 

URP sequences one can also generate longer rnul timers. This allows one to faster increase the length of 
URP modules. 

100133] Many URP sequences contain particular amino acids at high fraction. Such sequences can be difficult to 
produce by recombinant techniques as their coding genes can contain repetitive sequences that are subject 

25 to recombination. Furthermore, genes that contain particular codons at very high frequencies can limit 

expression as the respective loaded tRNAs in the production host become limiting. An example is the 
recombinant production of GRS. Glycine residues are encoded by 4 triplets, GGG 7 GGC, GGA, and GGT. 
As a result, genes encoding GRS tend to have high GC-content and tend to be particularly repetitive. An 
additional challenge can result from codon bias of the production host. In the case of E. coli, two glycine 

30 codons, GGA and GGG, are rarely used in highly expressed proteins. Thus codon optimization of the gene 

encoding URP sequences can be very desirable. One can optimize codon usage by employing computer 
programs that consider codon bias of the production host (Gustafsson, C, et al. (2004) Trends Biotechnol, 
22: 346-53). As an alternative, one can construct codon libraries where all members of the library encode 
the same amino acid sequence but where codon usage is varied. Such libraries can be screened for highly 

35 expressing and genetically stable members which are particularly suitable for the large-scale production of 

URP-containing products. 

Multivalent Unstructured Recombinant Proteins (MURPs): 

100134] As noted above, the subject URPs are particularly useful as modules for design of proteins of therapeutic 
40 value. Accordingly, the present invention provides proteins comprising one or more subject URPs. Such 

proteins are termed herein Multivalent Unstructured Recombinant Proteins (MURPs). 
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[00135] To construct MURPs, one or more URP sequences can be fused to the N-terminus or C-terrninus of a 

protein or inserted in the middle of the protein, e.g., into loo£s of a protein or in between modules of the 
protein of interest, to give the resulting modified protein improved properties relative to the unmodified 
protein. The combined length of URP sequences that are attached to a protein can be 40, 50, 60, 70, 80, 90, 
100, 150, 200 or more amino acids. 

[00136] The subject MURPs exhibit one or more improved properties as detailed below. 

Improved half-life: 

[00137] Adding a URP sequences to a pharmaceutically active protein can improve many properties of that protein . 
In particular, adding a long URP sequence can significantly increase the serum half-life of the protein. 
SuchURPs typically contain arnino acid sequences of at least about 40, 50, 60, 70, 80, 90, 100, 150, 200 or 
more amino acids. 

[00138] The URPs can be fragmented such that the resulting protein contains multiple URPs, or multiple fragments 
of URPs. Some or all of these individual URP sequences may be shorter that 40 amino acids as long as the 
combined length of all URP sequences in the resulting pTotein is at least 30 arnino acids. Preferably, the 
resulting protein has a combined length of URP sequences exceeding 40, 50, 60, 70, 80, 90, 100, 150, 200 
or more amino acids. In one aspect, the fused URPS can increase the hydrodynamic radius of a protein and 
thus reduces its clearance from the blood by the kidney. The increase in the hydrodynamic radius of the 
resulting fusion protein relative to the unmodified protein can be detectedby ultxacentrifugation, size 
exclusion chromatography, or light scattering. 
Improved tissue selectivity: 

[00139] Increasing the hydrodynamic radius can also lead to reduced penetration into tissues, which can be 
exploited to minimize side effects of a pharmaceutic ally active protein. It is well documented that 
hydrophilic polymers have a tendency to accumulate selectively in tumor tissue which is caused by the 
enhanced permeability and retention (EPR) effect. The underlying cause of the EPR effect is the leaky 
nature of tumor vasculature (McDonald, D. M., et al. (2002) Cancer Res, 62: 5381-5) and the lack of 
lymphatic drainage in tumor tissues. Therefore, the selectivity of pharmaceutically active proteins for 
tumor tissues can be enhanced by adding hydrophilic polymers. As such, the therapeutic index of a given 
pharmaceutically active protein can be increased via incorporating the subject URPS. 
Protection from degradation and reduced immunogenicity: 

[00140] Adding URP sequences can significantly improve the protease resistance of a protein. URP sequences 

themselves can be designed to be protease resistant and by attaching them to a protein one can shield that 
protein from the access of degrading enzymes. URP sequences can be added to pharmaceutically active 
proteins with the goal of reducing undesirable interactions of the protein with other receptors or surfaces. 
To achieve this, it can be beneficial to add the URP sequences to the pharmaceutically active protein in 
proximity to the site of the protein that makes such undesirable contacts. In particular, one can add URP 
sequences to pharmaceutically active proteins with the goal of reducing their interactions with any 
component of the immune system to prevent an immune response against the product of the invention. 
Adding a URP sequence to a pharmaceutically active protein can reduce interaction with pre-existing 
antibodies or B-cell receptors. Furthermore, the addition of URP sequences can reduce the uptake and 
processing of the product of the invention by antigen presenting cells. Adding one or more URP sequence 
to a protein is a preferred way of reducing its immunogenicity as it will suppress an immune response in 
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many species allowing one to predict the expected immunogenicity of a product in patients based on animal 
data. Such species independent testing of irnmunogemcity is not possible for approaches that are based on 
the identification and removal of human T cell epitopes or sequences comparison with human sequences. 
Interruption of T cell epitopes: 
[00141] URP sequences can be introduced into proteins in order to interrupt T cell epitopes. This is particularly 
useful for proteins that combine multiple separate functional modules. The formation of T cell epitopes 
requires that peptide fragments of a protein antigen bind to MHC. MHC molecules interact with a short 
segment of amino acids typically 9 contiguous residues of the presented peptides. The direct fusion of 
different binding modules in a protein molecule can lead to T cell epitopes that span two neighboring 
domains. By separating the functional modules by URP modules prevents the generation of such module- 
spanning T cell epitopes as illustrated in Figure 7. The insertion of URP sequences between functional 
modules can also interfere with proteolytic processing in antigen presenting cells, which will lead to an 
additional reduction of immunogenicity. Another approach to reduce the risk of immunogenicity is to 
disrupt T cell epitopes within functional modules of a product. In the case of microproteins, one approach 
is to have some of the intercysteine loops (those that are not involved in target binding) be glycine-rich. In 
microproteins, whose structure is due to a small number of cysteines, one could in fact replace most or all 
of the residues that are not involved in target binding with glycine, serine, glutamate, threonine, thus 
reducing the potential for immunogenicity while not affecting the affinity for the target. For instance, this 
can be carried out by performing a 'glycine -scan 7 of all residues, in which each residue is replaced by a 
glycine, then selecting the clones which retain target binding using pahge display or screening, and then 
combining all of the glycine substitutions that are permitted. In general, functional modules have a much 
higher probability to contain T cell epitopes than URP modules. One can reduce the frequency of T cell 
epitopes in functional modules by replacing all or many non-critical amino acid residues with small 
hydrophilic residues like gly, ser, aia, glu, asp, asn, gin, thr. Positions in a functional module that allow 
replacement can be identified using a variety of random or structure based protein engineering approaches. 
Improved solubility: 

[00142] Functional modules of a protein can have limited solubility. In particular, binding modules tend to carry 
hydrophobic residues on their surface, which can limit their solubility and can lead to aggregation. By 
spacing or flanking such functional modules with URP modules one can improve the overall solubility of 
the resulting product. This is in particular true for URP modules that carry a significant percentage of 
hydrophilic or charged residues. By separating functional modules with soluble URP modules one can 
reduce intramolecular interactions between these functional modules 
Improved pH profile and homogeneity of product charge: 

[00143] URP sequences can be designed to carry an excess of negative or positive charges. As a result they confer 
an electrostatic field to any fusion partner which can be utilized to shift the pH profile of an enzyme or a 
binding interaction. Furthermore, the electrostatic field of a charged URP sequence can increase the 
homogeneity of pKa values of surface charges of a protein product, which leads to sharpened pH profiles of 
ligand interactions and to sharpened separations by isoelectric focusing or chromatofocusing. 
Improved purification properties due to sharper product pKa: 

[00144] Each amino acid in solution by itself has a single, fixed pKa, which is the pH at which its functional groups 
are half protonated. In a typical protein you have many types of residues and due to proximity and protein 
breathing effects, they also change each other's effective pKa in variable ways. Because of this, at a wide 
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range of pH conditions, typical proteins can adopt hundreds of differently ionized species, each with a 
different molecular weight and net charge, due to large numbers of combinations of charged and neutral 
amino acid residues. This is referred to as a broad ionization spectrum and makes the analysis (ie Mass 
Spec) and purification of such proteins more difficult. 
5 [00145] PEG is uncharged and does not affect the ionization spectrum of the protein it is attached to, leaving it with 

a broad ionization spectrum. However, a URP with a high content of Gly and Glu in principle exist in 
only two states: neutral (-COOH) when the pH is below the pKa of Glutarnate and negatively charged (~ 
COO") when the pH is above the pKa of Glutarnate. URP modules can form a single, homogeneously 
ionizated type of molecule and can yield a single mass in mass spectrometry. 

10 [00146] Where desired, MURPs can be expressed as a fusion with an URP having a single type of charge (Glu) 

distributed at constant spacing through the URP module. One may choose to incorporate 25-50 Glu 
residues per 20kD of URP and all of these 25-50 residues would have very similar pKa. 
[00147] In addition, adding 25-50 negative charges to a small protein like EFN, hGH or GCSF (with only 20 

charged residues) will increase the charge homogeneity of the product and sharpen its isoelectric point, 

15 which will be very close to the pKa of free glutarnate. 

[00148] The increase in the homogeneity of the charge of the protein population has favorable processing 
properties, such as in ion exchange, isoelectric focusing, massspec, etc. compared to traditional 
PBGylation. 

Improved formulation and/or delivery: 

20 . [00149] Addition of URP sequences to pharmaceutically active proteins can significantly simplify the formulation 

and or the delivery of the resulting products. URP sequences can be designed to be very hydrophilic and as 
a result they improve the solubility of (for example) human proteins, which often contain hydrophobic 
patches that they use to bind to other human proteins. The formulation of such human proteins, like 
antibodies, can be quite challenging and often limits their concentration and delivery options. URPs can 

25 reduce product precipitation and aggregation and it allows one to use simpler formulations containing 

fewer ingredients, that are typically needed to stabilize a product in solution. The improved solubility of 
URP sequences-containing products allows to formulate these products at higher concentration and as a 
result one can reduce the injection volume for injectable products, which may enable home injection, which 
is limited to a very low injected volume. Addition of a URP sequence can also simplify the storage of the 

30 resulting formulated products. URP sequences can be added to pharmaceutically active proteins to 

facilitate their oral, pulmonary, rectal, or intranasal uptake. URP sequences can facilitate various modes of 
delivery because they allow higher product concentrations and improved product stability. Additional 
improvements can be achieved by designing URP sequences that facilitate membrane penetration. 
Improved production: 

35 [00150] Adding URP sequences can have significant benefits for the production of the resulting product. Many 

recombinant products, especially native human proteins, have a tendency to form aggregates during 
production that can be difficult or impossible to dissolve and even when removed from the final product 
they may re-occur. These are usually due to hydrophobic patches by which these (native human) proteins 
contacted other (native human) proteins and mutating these residues is considered risky because of 

40 immunogenicity. However, URPs can increase the hydrophilicity of such proteins and enable their 

formulation without mutating the sequence of the human protein. URP sequences can facilitate the. folding 
of a protein to reach its native state. Many pharmaceutically active proteins are produced by recombinant 
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methods in a non-native aggregated state. These products need to be denatured and subsequently they are 
incubated under conditions that allow the proteins to fold into their native active state. A frequent side 
reaction during renaturation is the formation of aggregates. The fusion of URP sequences to a protein 
significantly reduces its tendency to form aggregates and thus it facilitates the folding of the 
5 pharmaceutically active component of the product. URP -containing products are much easier to prepare as 

compared to polymer-modified proteins. Chemical polymer-modification requires extra modification and 
purification steps after the active protein has been purified. In contrast, URP sequences can be 
manufactured using recombinant DNA methods together with the pharmaceutically active protein. The 
products of the invention are also significantly easier to characterize compared to polymer-modified 

10 products. Due to the recombinant production process one can obtain more homogeneous products with 

defined molecular characteristics. URP sequences can also facilitate the purification of a product. For 
instance URP sequences can include subsequences that can be captured by affinity chromatography. An 
example are sequences rich in histidine, which can be captured on resins with immobilized metals like 
nickel, URP sequences can also be designed to have an excess of negatively or positively charged arnino 

15 acids. As a result they can significantly impact the net charge of a product, which can facilitate product 

purification by ion-exchange chromatography or preparative electrophoresis. 

(00151] The subject MURPs can contain a variety of modules, including but not limited to binding modules, 

effector modules, multimerization modules, C-terminal modules, and N-terrninal modules. Figure 1 depicts 
20 an exemplary MURP having multiple mudules. However, MURPs can also have relatively simple 

architectures that are illustrated in Fig. 2. MURPs can also contain fragmentation sites. These can be 
protease-sensitive sequences or chemically sensitive sequences that can be preferentially cleaved when the 
MURPs reach their target site. 

Binding Module (KM): 

25 [001 52] The MURPs of the present invention may comprise one or more binding modules. Binding 

module (BM) refers to a peptide or protein sequence that can bind specifically to one or several targets, 
which may be one or more therapeutic targets or accessory targets, such as for cell-, tissue- or organ 
targeting. BMs can be linear or cyclic peptides, cysteine-constrained peptides, microproteins, scaffold 
proteins (e.g., fibronectin, ankyrins, crystalline, streptavidin, antibody fragments, domain antibodies), 

30 peptidic hormones, growth factors, cytokines, or any type of protein domain, human or non-human, natural 

or non-natural, and they may be based on a natural scaffold or not based on a natural scaffold, or based on 
combinations or they may be fragments of any of the above. Optionally, these BMs can be engineered by 
adding, removing or replacing one or multiple amino acids in order to enhance their binding properties, 
their stability, or other properties. Binding modules can be obtained from natural proteins, by design or by 

35 genetic package display, including phage display, cellular display, ribosomal display or other display 

methods. Binding modules may bind to the same copy of the same target, which results in avidity, or they 
may bind to different copies of the same target (which can result in avidity if these copies are somehow 
connected or linked, such as by a cell membrane), or they may bind to two unrelated targets (which yields 
avidity if these targets are somehow linked, such as by a membrane). Binding modules can be identified by 

40 screening or otherwise analyzing random libraries of peptides or proteins. 

100153] Particularly desirable binding modules are those that upon incorporation into a MURP, the MURP yield a 
desirable Tepitope score. The Tepitope score of a protein is the log of the Kd (dissociation constant, 
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affinity, off-rate) of the binding of that protein to multiple of the most common human MHC alleles, as 
disclosed in Sturniolo, T. et al. (1999) Nature Biotechnology 17:555). The score ranges over at least 15 
logs, from about 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, -1, -2, -3, -4, -5 (10e 10 Kd) to about -5. Prefered MURPs 
yield a score less than about -3.5 [KKW: On absoluate scale?] 
5 [00154] Of particular interest are also binding modules comprising disufide bonds formed by pairing two cysteine 

residues. In certain embodiments, the binding modules comprise polypeptides having high cysteine content 
or high disulfide density (HDD). Binding modules of the HDD family typically have 5-50% (5, 6, 7, 8, 9, 
10, 12, 14, 16, 18, 20, 25, 30, 35, 40, 45 or 50%) cysteine residues and each domain typically contains at 
least two disulfides and optionally a co-factor such as calcium or another ion. 
10 (00155] The presence of HDD scaffold allows these modules to be small but still adopt a relatively rigid structure. 

Rigidity is important to obtain high binding affinities, resistance to proteases and heat, including the 
proteases involved in antigen processing, and thus contributes to the low or non-immunogenicity of these 
modules. The disulfide framework folds the modules without the need for a large number of hydrophobic 
side chain interactions in the interior of most modules. The small size is also advantageous for fast tissue 
15 penetration and for alternative delivery such as oral, nasal, intestinal, pulmonary, blood-brain-barrier, etc. 

In addition, the small size also helps to reduce immunogenicity. A higher disulfide density is obtainable, 
either by increasing the number of disulfides or by using domains with the same number of disulfides but 
fewer amino acids. It is also desirable to decrease the number of non-cysteine fixed residues, so that a 
higher percentage of amino acids is available for target binding. 

20 [00156] The cysteine-containing binding modules can adopt a wide range of disulfide bonding patterns (DBFs). 

For example, two-disulfide modules can have three different disulfide bonding patterns (DBPs), three- 
disulfide modules can have 15 different DBPs and four-disulfide modules have up to 105 different DBPs. 
Natural examples exist for all of the 2SS DBPs, the majority of the 3SS DBPs and less than half of the 4SS 
DBPs. In one aspect, the total number of disulfide bonding patterns can be calculated according to the 

25 formula: Error! Objects cannot be created from editing field codes., wherein n= the predicted number 

of disulfide bonds formed by the cysteine residues, and wherein Error! Objects cannot be created from 
editing field codes.represents the product of (2i-l), where i is a positive integer ranging from 1 up to n. 
[00157] Accordingly, in one embodiment, the modules used in MURPs are natural or non-naturally occurring 

cysteine (C)-containing scaffold exhibiting a binding specificity towards a target molecule, wherein the 

30 non-naturally occurring cysteine (C)-containing scaffold comprise intra-scaffold cysteines according to a 

pattern selected from the group of permutations represented by the formula Error! Objects cannot be 
created from editing field codes., wherein n equals to the predicted number of disulfide bonds formed by 
the cysteine residues, and wherein Error! Objects cannot be created from editing field codes.represents 
the product of (2i-l), where i is a positive integer ranging from 1 up to n. In one aspect, the natural or non- 

35 naturally occurring cysteine (C)-containing module comprises a polypeptide having two disulfide bonds 

formed by pairing cysteines contained in the polypeptide according to a pattern selected from the group 
consisting of C 1 " 2, 3_4 , C 1 " 3 ' 2 "* and C 1 " 4 ' 2 " 3 } wherein the two numerical numbers linked by a hyphen indicate 
which two cysteines counting from N-tenninus of the polypeptide are paired to form a disulfide bond. In 
another aspect, the natural or non-naturally occurring cysteine (C)-containing module comprises a 

40 polypeptide having three disulfide bonds formed by pairing intra-scaffold cysteines according to a pattern 

selected from the group consisting of C 1 " 2, 3-4> 5 " 6 , C 1 " 2 * 3 ~ 5, C l " 2 * 2 ' 6> 4 ~ 5 , C 1 " 3 ' 2 " 4, 5 " 6 , CI " 3 ' 2 " 5, C 1 " 3 ' 2 ~*' 4-5 C 1 " 

4,2-3,5-6 c l-4,2-6,3-5 £.1-5,2-3.4-6 pl-S.2-4,3-6 £,1-5.2-6,3-4 ^1-6,2-3,4-5 , ^ 1-6, 2-5, 3-4 . . , . , 

5 ^ » > ^ » ° * , and C , wherein the two numerical 
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numbers linked by a hyphen indicate which two cysteines counting from N-terminus of the polypeptide are 
paired to form a disulfide bond. In yet another aspect, the natural or non-naturally occurring cysteine (C)- * 
containing module comprises a polypeptide having at least four disulfide bonds formed by pairing cysteines 
contained in the polypeptide according to a pattern selected from the group of permutations defined by the 
formula above. In yet another aspect, the natural or non-naturally occurring cysteine (C)-containing 
module comprises a polypeptide having at least five, six, or more disulfide bonds formed by pairing intra- 
protein cysteines according to a pattern selected from the group of permutations represented by the formula 
above. Any of the cysteme-containing proteins or scaffolds disclosed in the co-pending applications [serial 
nos. 1 1/528,927 and 1 1/528,950 which are incorporated herein by reference in their entiety] are candidate 
binding modules. 

[00158] Binding modules can also be selected from libraries of cysteine-constrained cyclic peptides with 4, 5, 6, 7, 
8, 9, 10, 1 1 and 12 randomized or partially randornized amino acids between the disulfide-bonded cystines 
(e.g., in a build-up manner), and in some cases additional randomized amino acids on the outside of the 
cystine pair can be constructed using a variety of methods. Library memders with specificity for a target 
of interest can be identified using various methods including phage display, ribosomal display, yeast 
display and other methods known in the art. Such cyclic peptides can be utilized as binding modules in 
MURPs. In a preferred embodiment one can further engineer cysteine-constrained peptides to increase 
there binding affinity, proteolytic stability, and/or specificity using buildup approaches that lead to binding 
modules containing more than one disulfide bond. One particular buildup approach is illustrated in Fig. 25. 
It is based on the addition of a single cysteine plus multiple randornized residues on the N-terrriinal side of 
the previously selected cyclic peptide, as well as on the C-terminal side. One can generate libraries that 
have been designed as illustrated in Fig. 25. Binding modules with improved properties can be identified 
by phage display or similar methods. Such buildup libraries can contain between 1 and 12 random 
positions on the N-terminal as well as on the C-terminal side of a cyclic peptide. The distance between the 
cysteine residues in the newly added random flanks and the cysteine residues in the cyclic peptide can be 
varied between 1 and 12 residues. Such libraries will contain four cysteine residues per library member, 
with two cysteines resulting from the original cyclic peptide and two cysteine residues in the newly added 
flanks. This approach favors a 1-4 2-3 DBP or a change in DBP, breaking up the preexisitng 1-2 disulfide 
(= 2-3 in the 4-cysteine construct) to form a 1-2 3-4 or a 1-3 2-4 DBP. Such buildup approaches can be 
performed with clone-specific primers so that it leaves no fixed sequence between the library areas as 
shown in Fig. 25, or it can be performed with primers that use (and thus leave) a fixed sequence on both 
sides of the previously selected peptide and therefore these same primers can be used for any previously 
selected clone as illustrated in Fig. 26. The method illustrated in Fig. 26 can be applied to a collection of 
cyclic peptides with specificity for a target of interest Both buildup approaches were shown to work for 
anti-VEGF affinity maturation by build-up. This approach can be repeated to generate binding modules 
with six or more cysteine residues. 

[00159] Another buildup of a one-disulfide into a 2-disulfide sequence is illustrated in Fig. 27. It involves the 

dimcrization of a previously selected pool of 1 -disulfide peptides with itself so that the preselected peptide 
pool ends up in the N-terminal as well as in the C-terminal position. This approach favors the build up of 
2-disulfide sequences that recognize two separate epitopes on a target. 

[00160] Another buildup approach involves the addition of a (partially) randomized sequence of 6-15 residues 
containing two cysteines that are spaced 4,5,6,7,8,9,or 10 amino acids apart, with optionally additional 
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randomized positions outside the linked cysteines. This 2-cysteine random sequence is added on the N- 
terminal side of the previously selected peptide, or on the C-terminal side. This approach favors a 1-2 3-4 
DBP, although other DBPs may be formed. This approach can be repeated to generate binding modules 
with six or more cysteine residues. 

[00161] Binding modules can be constructed based on natural protein scaffolds. Such scaffolds can be identified by 
data base searching. Libraries that are based on natural scaffolds can be subjected to phage display panning 
followed by screening to identify sequences that specifically bind to a target of interest. 

[00162] A wide selection of natural scaffolds is available for constructing the binding modules. The choice of a 

particular scaffold will depend on the intended target. Non-limiting examples of natural scaffolds include 
snake-toxin-like proteins such as snake venom toxins and extracellular domain of human cell surface 
receptors. Non-limiting examples of snake venom toxins are Erabutoxin B, gamma-Cardio toxin, Faciculm, 
Muscarinic toxin, Erabutoxin A, Neurotoxin I, Cardiotoxin V4II (Toxin III), Cardiotoxin V, alpha- 
Cobratoxin, long Neurotoxin 1, FS2 toxin, Bungarotoxin, Bucandin, Cardiotoxin CTXI, Cardiotoxin CTX 
IIB, Cardiotoxin II, Cardiotoxin III, Cardiotoxin IV, Cobrotoxin 2, alpha-toxins, Neurotoxin II (cobrotoxin 
B), Toxin B (long neurotoxin), Candotoxin, Bucain. Non-limting examples of extracellular domain of 
(human) cell surface receptors include CD59, Type II activin receptor, BMP receptor la ectodomain, TGF- 
beta type II receptor extracellular domain. Other natural scaffolds include but are not limited to A- 
domains, EGF, Ca~EGF, TNF-R, Notch, DSL, Trefoil, PD 3 TSP1, TSP2, TSP3, Anato, Integrin Beta, 
Thyroglobulin, Defensin 1 , Defensin 2, Cyclotide, SHKT, Disintegrins, Myotoxins, Gamrna-Thioneins, 
Conotoxin, Mu-Conotoxin, Omega-Atracotoxins, Delta-Atracotoxins, as well as additional families 
disclosed in co-pending applications serial nos. 1 1/528,927 and 1 1/528,950, which are incorporated herein 
in their entirety. 

(00163] A large variety of methods has been described that allow one to identify binding molecules in a large 

library of variants. One method is chemical synthesis. Library members can be synthesized on beads such 
that each bead carries a different peptide sequence. Beads that carry ligands with a desirable specificity can 
be identified using labeled binding partners. Another approach is the generation of sub-libraries of peptides 
which allows one to identify specific binding sequences in an iterative procedure (Pinilla, C, et al. (1992) 
BioTechniques y 13: 901-905). More commonly used are display methods where a library of variants is 
expressed on the surface of a phage, protein, or cell. These methods have in common, that that DNA or 
RNA coding for each variant in the library is physically linked to the ligand. This enables one to detect or 
retrieve the ligand of interest and then determine its peptide sequence by sequencing the attached DNA or 
RNA. Display methods allow one skilled in the art to enrich library members with desirable binding 
properties from large libraries of random variants. Frequently, variants with desirable binding properties 
can be identified from enriched libraries by screening individual isolates from an enriched library for 
desirable properties. Examples of display methods are fusion to lac repressor (Cull, M., et al. (1992) Proc. 
Natl Acad. Set USA, 89: 1865-1869), cell surface display (Wittrup, K. D. (2001) Curr Opin BiotechnoU 
12: 395-9). Of particular interest are methods were random peptides or proteins are linked to phage 
particles. Commonly used are Ml 3 phage (Smith, G. P., et al. (1997) Chem Rev, 97: 391-410) and T7 
phage (Danner, S., etal. (2001) Proc Natl Acad Sci USA, 98: 12954-9). There are multiple methods 
available to display peptides or proteins on MI 3 phage. In many cases, the library sequence is fused to the 
N- terminus of peptide pill of the M 13 phage. Phage typically carry 3-5 copies of this protein and thus 
phage in such a library will in most cases carry between 3-5 copies of a library member. This approach is 
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referred to as multivalent display. An alternative is phagemid display where the library is encoded on a 
phagemid. Phage particles can be formed by infection of cells carrying a phagemid with a helper phage. 
(Lowman, H. B., et al. (1991) Biochemistry, 30: 10832-10838). This process typically leads to monovalent 
display. In some cases, monovalent display is preferred to obtain high affinity binders. In other cases 
multivalent display is preferred (O'Connell, D., et al. (2002) J Mo! Biol, 321: 49-56). 

[00164] A variety of methods have been described to enrich sequences with desirable characteristics by phage 

display. One can immobilize a target of interest by binding to immunotubes, microliter plates, magnetic 
beads, or other surfaces. Subsequently, a phage library is contacted with the immobilized target, phage that 
lack a binding ligand are washed away, and phage carrying a target specific ligand can be eluted by a 
variety of conditions. Ehmon can be performed by low pH, high pH, urea or other conditions that tend to 
break protein-protein contacts. Bound phage can also be eluted by adding E. coli cells such that eluting 
phage can directly infect the added E. coli host. An interesting protocol is the elution with protease which 
can degrade the phage-bound ligand or the immobilized target. Proteases can also be utilized as tools to 
enrich protease resistant phage-bound ligands. For instance, one can incubate a library of phage-bound 
ligands with one or more (human or mouse) proteases prior to panning on the target of interest. This 
process degrades and removes protease-labile ligands from the library (Kristensen, P., et al. (1998) Fold 
Des, 3: 32 1-8). Phage display libraries of ligands can also be enriched for binding to complex biological 
samples. Examples are the panning on immobilized cell membrane fractions (Tur, M. K., et al. (2003) IntJ 
Mol Med, 11: 523-7), or entire cells (Rasmussen, U. B., et al. (2002) Cancer Gene Ther, 9: 606-12; Kelly, 
K. A., et al. (2003) Neoplasia, 5: 437-44). In some cases one has to optimize the panning conditions to 
improve the enrichment of cell specific binders from phage libraries (Watters, J. M., et al. (1997) 
Immunotechnology, 3: 21-9). Phage panning can also be performed in live patients or animals. This 
approach is of particular interest for the identification of ligands that bind to vascular targets (Arap, W., et 
al. (2002) Nat Med, 8: 121-7). 

[00165] A variety of cloning methods are available that allow one skilled in the art to generate libraries of DNA 

sequences that encode libraries of peptides. Random mixtures of nucleotides can be utilized to synthesize 
oligonucleotides that contain one or multiple random positions. This process allows one to control the 
number of random positions as well as the degree of randomization. In addition, one can obtain random or 
semi-random DNA sequences by partial digestion of DNA from biological samples. Random 
oligonucleotides can be used to construct libraries of plasmids or phage that are randomized in pre-defined 
locations. This can be done by PCR fusion as described in (de Kruif, J., et al. (1995) J Mol Biol, 248: 97- 
105). Other protocols are based on DNA ligation (Felici, R, et al. (1991) J Mol Biol, 222: 301-10; Kay, B. 
K. a et al. (1993) Gene, 128: 59-65). Another commonly used approach is Kunkel mutagenesis where a 
mutagenized strand of a plasmid or phagemid is synthesized using single stranded cyclic DNA as template. 
See, Sidhu, S. S., et al. (2000) Methods Enzymol, 328: 333-63; Kunkel, T. A., et al. (1987) Methods 
Enzymol, 154: 367-82. 

[00166] Kunkel mutagenesis uses templates containing randomly incorporated uracil bases which can be obtained 
from E. coli strains like CJ236. The uracil-containing template strand is preferentially degraded upon 
transformation into E. coli while the in vitro synthesized mutagenized strand is retained. As a result most 
transformed cells carry the mutagenized version of the phagemid or phage. A valuable approach to 
increase diversity in a library is to combine multiple sub-libraries. These sub-libraries can be generated by 
any of the methods described above and they can be based on the same or on different scaffolds. 
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[00167] A useful method to generate large phage libraries of short peptides has been recently described (Scholle, M. 
D., et al. (2005) Comb Chem High Throughput Screen, 8: 545-51). This method is related to the Kunkel 
approach but it does not require the generation of single stranded template DNA that contains random 
uracil bases. Instead, the method starts with a template phage that carries one or more mutations close to 
the area to be mutagenized and said mutation renders the phage non-infective. The method uses a 
mutagenic oligonucleotide that carries randomized codons in some positions and that correct the phage- 
inactivating mutation in the template. As a result, only mutagenized phage particles are infective after 
transformation and very few parent phage are contained in such libraries. This method can be further 
modified in several ways. For instance, one can utilize multiple mutagenic oligonucleotides to 
simultaneously mutagenize multiple discontiguous regions of a phage. We have taken this approach one 
step further by applying it to whole microproteins of >25, 30, 35, 40, 45, 50, 55 and 60 amino acids, 
instead of short peptides of <10, 15 or 20 amino acids, which poses an additional challenge. This approach 
now yields libraries of more than lOelO transformants (up to lOel 1) with a single transformation, so that a 
single library with a diversity of 10el2 is expected from 10 transformations. 

[00168] Another variation of the Scholle method is to design the mutagenic oligonucleotide such that an amber stop 
codon in the template is converted into an ochre stop codon, and an ochre into an amber in the next cycle of 
mutagenesis. In this case the template phage and the mutagenized library members must be cultured in 
different suppressor strains of E. coli, alternating an ochre suppressor with amber suppressor strains. This 
allows one to perform successive rounds of mutagenesis of a phage by alternating between these two types 
of stop codons and two suppressor strains. 

[00169] Yet another variation of the Scholle approach involves the use of megaprimers with a single stranded phage 
DNA template. The megaprimer is a long ssDNA that was generated from the library inserts of the selected 
pool of phage from the previous round of panning. The goal is to capture the full diversity of library inserts 
from the previous pool, which was mutagenized in one or more areas, and transfer it to a new library in 
such a way that an additional area can be mutagenized. The megaprimer process can be repeated for 
multiple cycles using the same template which contains a stop-codon in the gene of interest. The 
megaprimer is a ssDNA (optionally generated by PGR) which contains 1) 5* and 3' overlap areas of at least 
15 bases for complementarity to the ssDNA template, and 2) one or more previously selected library areas 
(1,2,3,4 or more) which were copied (optionally by PGR) from the pool of previously selected clones, and 
3) a newly mutagenized library area that is to be selected in the next round of panning. The megaprimer is 
optionally prepared by 1) synthesizing one or more oligonucleotides encoding the newly synthesized 
library area and 2) by fusing this, optionally using overlap PGR, to a DNA fragment (optionally obtained 
by PCR) which contains any other library areas which were previously optimized. Run-off or single 
stranded PCR of the combined (overlap) PCR product is used to generate the single stranded megaprimer 
that contains all of the previously optimized areas as well as the new library for an additional area that is to 
be optimized in the next panning experiment. This approach is expected to allow affinity maturation of 
proteins using multiple rapid cycles of library creation generating lOel 1 to 10el2 diversity per cycle, each 
followed by panning. 

[00170] A variety of methods can be applied to introduce sequence diversity into (previously selected or naive) 
libraries of microproteins or to mutate individual microprotein clones with the goal of enhancing their 
binding or other properties like manufacturing, stability or immunogemcity. In principle, all the methods 
that can be used to generate libraries can also be used to introduce diversity into enriched (previously 
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selected) libraries of microproteins. In particular, one can synthesize variants with desirable binding or 
other properties and design partially randomized oligonucleotides based on these sequences. This process 
allows one to control the positions and degree of randomization. One can deduce the utility of individual 
mutations in a protein from sequence data of multiple variants using a variety of computer algorithms 
(Jonsson, J., et al. (1993) Nucleic Acids Res, 21: 733-9 ; Amin, N., et al. (2004) Protein Eng Des Sel 9 17: 
787-93), Of particular interest for the re-mutagenesis of enriched libraries is DNA shuffling (Stemmer, W. 
P. C. (1994) Nature, 370: 389-391), which generates recombinants of individual sequences in an enriched 
library. Shuffling can be performed using a variety modified PCR conditions and templates may be 
partially degraded to enhance recombination. An alternative is the recombination at pre-defined positions 
using restriction enzyme-based cloning. Of particular interest are methods utilizing type IIS restriction 
enzymes that cleave DNA outside of their sequence recognition site (Collins, J., et al. (2001) J Biotechnol, 
74: 3 17-38. Restriction enzymes that generate non-palindromic overhangs can be utilized to cleave 
plasmids or other DNA encoding variant mixtures in multiple locations and complete plasmids can be re- 
assembled by ligation (Berger, S. L., et al. (1993) Anal Biochem, 214: 571-9). Another method to 
introduce diversity is PCR-mutagenesis where DNA sequences encoding library members are subjected to 
PCR under mutagenic conditions. PCR conditions have been described that lead to mutations at relatively 
high mutation frequencies (Leung, D., et al. (1989) Technique, 1: 1 1-15). In addition, a polymerase with 
reduced fidelity can be employed (Vanhercke, T., et al. (2005) Anal Biochem, 339: 9-14). A method of 
particular interest is based on mutator strains (Irving, R. A., et al. (1996) Immunotechn olo gy 7 2: 127-43; 
Coia, G., et al. (1 997) Gene, 201: 203-9). These are strains that carry defects in one or more DNA repair 
genes. Plasmids or phage or other DNA in these strains accumulate mutations during normal replication. 
One can propagate individual clones or enriched populations in mutator strains to introduce genetic 
diversity. Many of the methods described above can be utilized in an iterative process. One can apply 
multiple rounds of mutagenesis and screening or panning to entire genes, or to portions of a gene, or one 
can mutagenize different portions of a protein during each subsequent round (Yang, W. P., et al. (1995) J 
MolBiol, 254: 392-403). 

[00171] The libraries can be further treated to reduce artifacts. Known artifacts of phage panning include 1) no- 
specific binding based on hydrophobicity, and 2) multivalent binding to the target, either due to a) the 
pentavalency of the pill phage protein, or b) due to the formation of disulfides between different 
microproteins, resulting in multimers, or c) due to high density coating of the target on a solid support and 
3) context-dependent target binding, in which the context of the target or the context of the microproteins 
becomes critical to the binding or inhibition activity. Different treatment steps can be taken to minimize the 
magnitude of these problems. For example, such treatments are applied to the whole library, but some 
useful treatments that remove bad clones can only be applied to pools of soluble proteins or only to 
individual soluble proteins. 

[00172] Libraries of cysteine-containing scaffolds are likely to contain free thiols, which can complicate directed 

evolution by cross-linking to other proteins. One approach is to remove the worst clones from the library by 
passing it over a free-thiol column, thus removing all clones that have one or more free sulfhydryls. Clones 
with free SH groups can also be reacted with biotin-SH reagents, enabling efficient removal of clones with 
reactive SH groups using Streptavidin columns. Another approach is to not remove the free thiols, but to 
inactivate them by capping them with sulfhydryl-reactive chemicals such as iodoacetic acid. Of particular 
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interest are bulky or hydrophilic sulfhydryl reagents that reduce the non-specific target binding or modified 
variants. 

[00173] Examples of context dependence are all of the constant sequences, including pill protein, linkers, peptide 
tags, biotin-streptavidin, Fc and other fusion proteins that contribute to the interaction. The typical 
5 approach for avoiding context-dependence involves switching the context as frequently as practical in order 

to avoid buildup. This may involve alternating between different display systems (ie Ml 3 versus T7, or 
M13 versus Yeast), alternating the tags and linkers that are used, alternating the (solid) support used for 
immobilization (ie immobilization chemistry) and alternating the target proteins itself (different vendors, 
different fusion versions). 

10 [00174] Library treatments can also be used to select for proteins with preferred qualities. One option is the 

treatment of libraries with proteases in order to remove unstable variants from the library. The proteases 
used are typically those that would be encountered in the application. For pulmonary delivery, one would 
use lung proteases, for example obtained by a pulmonary lavage. Similarly, one would obtain mixtures of 
proteases from serum, saliva, stomach, intestine, skin, nose, etc. However, it is also possible to use 
15 mixtures of single purified proteases. An extensive list of proteases is shown in [Appendix E]. The phage 

themselves are exceptionally resistant to most proteases and other harsh treatments. 
[00175] For example, it is possible to select the library for the most stable structures, ie those with the strongest 
disulfide bonds, by exposing it to increasing concentrations of reducing agents (ie DTT or 
betamercaptoethanol), thus eliminating the least stable structures first. One would typically use reducing 
20 a S ent ( ie DTT , BME ? other) concentrations from 2.5mM, to 5mM, lOmM, 20mM, 30mM s 40mM, 50mM, 

60mM, 70mM, 80mM, 90mM or even lOOmM, depending on the desired stability. 
[00176] It is also possible to select for clones that can be efficiently refolded in vitro, by reducing the entire display 
library with a high level of reducing agent, followed by gradually re-oxidizing the protein library to reform 
the disulfides, followed by the removal of clones with free SH groups, as described above. This process can 
25 be applied once or multiple times to eliminate clones that have low refolding efficiency in vitro. 

[00177] One approach is to apply a genetic selection for protein expression level, folding and solubility as described 
by A. C. Fisher et al. (2006) Genetic selection for protein solubility enabled by the folding quality control 
feature of the twin-arginine translocation pathway. Protein Science (online). After panning of display 
libraries (optional), one would like to avoid screening thousands of clones at the protein level for target 
30 binding, expression level and folding. An alternative is to clone the whole pool of selected inserts into a 

betalactamase fusion vector, which, when plated on betalactam, the authors demonstrated to be selective for 
well-expressed, fully disulfide bonded and soluble proteins. 
[00178] Following M13 Phage display of protein libraries and panning on targets for one or more cycles, there are a 
variety of ways to proceed, including (1) screening of individual phage clones by phage ELISA, which 
35 measures the number of phage particles (using anti-M13 antibodies) that bind to an immobilized target; (2) 

trarisferring from Ml 3 into T7 phage display libraries. The second approach is particularly useful in 
reducing the occurrence of false positives based on valency. Any single library format tends to favor 
clones that can form high-avidity contacts with the target. This is the reason that screening of soluble 
proteins is important, although this is a tedious solution. The multivalency achieved in T7 phage display is 
40 likel y verv different from that achieved in M 13 display, and cycling between T7 and M13 can be an 

excellent approach to reducing the occurrence of false positives based on valency. 

-41- 



WO 2007/103515 



PCT/US2007/005952 



[00179] Filter lift is another methodology that can be with bacterial colonies grown at high density on large agar 
plates(10e2-10e5). Small amounts of some proteins are secreted into the media and end up bound to the 
filter membrane (nitrocellulose or nylon). The filters are then blocked in non-fat milk, 1% Casein 
hydrolysate or a 1% BSA solution and incubated with the target protein that has been labeled with a 
5 fluorescent dye or an indicator enzyme (directly or indirectly via antibodies or via biotin-streptavidin). The 

location of the colony is determined by overlaying the filter on the back of the plate and all of the positive 
colonies are selected and used for additional characterization. The advantage of filter lifts is that it can be 
made to be affuuty-selective by reading the signal after washing for different periods of time. The signal of 
high affinity clones 'fades' slowly, whereas the signal of low affinity clones fades rapidly. Such affinity 
10 characterization typically requires a 3-point assay with a well-based assay and may provide better clone-to- 

clone comparability than well-based assays. Gridding of colonies into an array is useful since it rmnimizes 
differences due to colony size or location. 
N~terminal modules: 

[00180] The subject MURPs can contain N-terminal modules (NM), which are particularly useful e.g., in 

1 5 facilitating production of the MURPs. The NM can be a single methionine residue when the products is 

expressed in the E. coli cytoplasm. A typical product format is an URP fused to a therapeutic protein, 
which is expressed in the bacterial cytoplasm so that the N-terminus is formyl-methionine. The formyl- 
methionine can either be permanent or temporary, if it is removed by biological or chemical processing. 
[00181] The NM can also be a peptide sequence that has been engineered for proteolytic processing, which can be 

20 used to remove tags or to remove fusion proteins. The N-terrninal module can be engineered to facilitate 

the purification of the MURP by including an affinity tag such as the Flag-, Myc-, HA- orHis-tag. The N- 
terrninal module can also include an affinity tag that can be used for the detection of the MURP. An NM 
can be engineered or selected for high-level expression of the MURP. It can also be engineered or selected 
to enhance the protease resistance of the resulting MURP. MURPs can be produced with an N-terrninal 

25 module that facilitates expression and/or purification. This N-terminal module can be cleaved off during 

the production process with a protease, such that the final product does not contain an N-terrninal module. 
[00182] By optimizing the amino acid and codon choice of the N-terminal module one can increase recombinant 
production. The N-teiminal module can also contain a processing site that can be cleaved by a specific 
protease like factor Xa, thrombin, or enterokinase, Tomato Etch Virus (TEV) protease. Processing sites 

30 can also be designed to be cleavable by chemical hydrolysis. An example is the amino acid sequence asp- 

pro that can be cleaved under acidic conditions. An N-terminal module can also be designed to facilitate 
the purification of a MURP. For example, N-terminal modules can be designed to contain multiple his 
residues which allow product capture by immobilized metal chromatography. N-terrninal modules can 
contain peptide sequences that can be specifically captured or detected by antibodies. Examples are FLAG, 

35 HA, c-myc. 

C-terminal modules: 

[00183] MURPs can contain a C-terminal module, which are particularly useful e.g., in facilitating production of 

the MURPs. For example, C-terminal module can comprise a cleavage site to effect proteolytic processing 
to remove sequences that are fused and hence increasing protein expression or facilitating purification. In 
40 particular, the C-teirminal module can also contain a processing site that can be cleaved by a specific 

protease like factor Xa> thrombin, TEV protease or enterokinase. Processing sites can also be designed to 
be cleavable by chemical hydrolysis. An example is the amino acid sequence asp-pro that can be cleaved 
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under acidic conditions. The C-terminal module can be an affinity tag aimed at facilitating the purification 
of the MURP. For example, C-terminal modules can be designed to contain multiple his residues which 
allow product capture by immobilized metal chromatography. C-terminal modules can contain peptide 
sequences that can be specifically captured or detected by antibodies. Non-limiting examples of the tags 
include FLAG-, HA-, c-myc, or His-tag. C-terminal module can also be engineered or selected to enhance 
the protease resistance of the resulting MURP. 
[00184] Where desired, the N-terminus of the protein can be linked to its own C-terminus. For example, linking 
these two modules can be carreid out by creating an amino acid-like natural linkage (peptide bond) or by 

i 

using an exogenous linking entity. Of particular interest are cyclotides, a family of small proteins in which 
this occurs naturally. Adopting a structural format like cyclotides is exepcted to provide additional stability 
against exo-proteases. Such intramolecular linkage typically works better at lower protein concentrations. 
Effector modules: 

[00185] MURPs can comprise one or multiple effector modules (EMs), or none at all. Effector modules typically 
do not provide the targeting, but they provide an activity required for therapeutic effect, like cell-killing. 
EMs can be pharmaceuticaly active small molecules (ie toxic drugs), peptides or proteins. Non-limiting 
examples are cytokines, antibodies enzymes, growth factors, hormones, receptors, receptor agonists or 
antagonists, whether whole or a fragment or domain thereof. Effector modules can also comprise peptide 
sequences that carry chemically linked small molecule drugs, whether synthetic or natural. Optionally, 
these effector molecules can be linked to the effector module via chemical linkers, which may or may not 
be cleaved under selected conditions leading to a release of the toxic activity. EMs can also include 
radioisotopes and their chelates, as well as various labels for PET and MRI. Effector modules can also be 
toxic to a cell or a tissue. Of particular interest are MURPs that contain toxic effector modules and binding 
modules with specificity for a diseased tissue or disease cell type. Such MURPs can specifically 
accumulate in a diseased tissue or in diseased cells and the can exert their toxic action preferentially in the 
diseased cells or tissues. Listed below are exemplary effector modules. 

[00186] Enzymes - Effector modules can be enzymes. Of particular interest are enzymes that degrade metabolites 
that are critical for cellular growth like carbohydrates or amino acids or lipids or co-factors. Other 
examples for effector modules with enzymatic activity are RNase, DNase, and phosphatase, asparaginase, 
histidinase, arginase, betalactamase. Effector modules with enzymatic activity can be toxic when delivered 
to a tissue or cell. Of particular interest are MURPs that combine effector modules that are toxic and 
binding modules that bind specifically to a diseased tissue. Enzymes that convert an inactive prodrug into 
an active drug at the tumor site are also potential effector modules. 

[00187] Drug - The subject MURP can contain an effector that is a drug. Where desired, sequences can be 

designed for the organ-selective delivery of drug molecules. An example is illustrated in figure 8. An URP 
sequence can be fused to a protein that preferentially binds to diseased tissue. The same URP sequence can 
contain one or more amino acid residues that can be modified for the attachment of drug molecules. Such a 
conjugate can bind to diseased tissue with high specificity and the attached drug molecules can result in 
local action while minimizing systemic drug exposure. The MURP can be designed to facilitate the release 
of drug molecules at the target size by introducing protease-sensitive sites that can be cleaved by native 
proteases at the site of desired action. A significant advantage of using URP sequences for the design of 
drug delivery constructs is that one can avoid undesirable interactions between the drug molecule and the 
targeting domain of the construct. Many drug molecules that can be conjugated to targeting domains have 
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significant hydrophobicity and the resulting conjugates tend to aggregate. By adding hydrophilic URP 
sequences to such constructs one can improve the solubility of the resulting delivery constructs and as a 
consequence reduce the aggregation tendency. Furthermore, one can increase the number of drug 
molecules that can be fused to a targeting domain by adding long URP sequences. In addition, the use of 
URP sequences allows one to optimize the distance between the drug conjugation sites to facilitate 
complete conjugation. The list of suitable drugs includes but are not limited to chemotherapeutic agents 
such as thiotepa and cyclosphosphamide (CYTOXAN™); alkyl sulfonates such as busulfan, irnprosulfan 
and piposulfan; azdridines such as benzodopa, carboquone, meturedopa, and uredopa; emylenimines and 
memylamelamines including altretamine, triemylenemelarnine, trietylenephosphoramide, 
triethylenethiophosphaorarnide and trimethylolomelamine; nitrogen mustards such as chlorambucil, 
chlomaphazine, cholophosphamide, estramustine, ifosfarnide, mechlorethamine, mechlorethamine oxide 
hydrochloride, melphalan, novembichin, phenesterine, predrnmustine, trofosfamide, uracil mustard; 
nitrosureas such as carmustine, chlorozotocin, fotemustine, lomustine, nimustine, ranimustine; antibiotics 
such as aclacinomysins, actinomycin, authramycin, azaserine, bleomycins, cactinomycin, calicheamicin, 
carabicin, carrninomycin, carzinophilin, chromomycins, dactinomycin, daunorubicin, detorubicin, 6-diazo- 
5-oxo L-norleucine, doxorubicin, epirubicin, esorubicin, idarubicin, marcellomycin, mitomycins, 
mycophenolic acid, nogalamycin, olivomycins, peplomycin, potfiromycin, puromycin, quelamycin, 
rodorubicin, streptonigrin, streptozocin, tubercidin, ubenimex, zinostatin, zorubicin; anti-metabolites such 
as methotrexate and 5-fluorouracil (5-FU); folic acid analogues such as denopterin, methotrexate, 
pteropterin, trimetrexate; purine analogs such as fludarabine, 6-mercaptopurine, thiamiprine, thioguanine; 
pyrimidine analogs such as ancitabine, azacitidine, 6-azauridine, carmofur, cytarabine, dideoxyuridine, 
doxifluridine, enocitabine, floxuridine, androgens such as calusterone, dromostanolone propionate, 
epitiostanol, mepitiostane, testolactone; anti-adrenals such as animoglutethimide, mitotane, trilostane; folic 
acid replenisher such as frolinic acid; aceglatone; aldophosphamide glycoside; aminolevulinic acid; 
amsacrine; bestrabucil; bisantrene; edatraxate; defofarnine; demecolcine; diaziquone; duocarmycin, 
maytansin, auristatin, elfonnthine; elliptinium acetate; etoglucid; gallium nitrate; hydroxyurea; lentinan; 
looidamine; rnitoguazone; mitoxantrone; mopidamol; nitracrine; peatostatin; phenamet; pirambicin; 
podophyliinic acid; 2-emylhydrazide; procarbazine; PSK.R™ ; razoxane; sizofiran; spirogermanium; 
tenuazonic acid; triaziquone; 2 J 2 , ,2"-trichlorotriethyla- mine; urethan; vindesine; dacarbazine; 
mannomustine; mitobronitol; mitolactol; pipobroman; gacytosine; arabinoside ("Ara-C"); 
cyclophosphamide; thiotepa; taxanes, e.g. paclitaxel (TAXOL™, Bristol-Myers Squibb Oncology, 
Princeton, N.J.) and docetaxel (TAXOTERE™, Rhone-Poulenc Rorer, Antony, France); chlorambucil; 
gemcitabine; 6-thioguanine; mercaptopurine; methotrexate; platinum analogs such as cisplatin and 
carboplatin; vinblastine; platinum; etoposide (VP-16); ifosfamide; mitomycin C; mitoxantrone; vincristine; 
vinorelbine; navelbine; novantrone; teniposide; daunomycin; arninopterin; xeloda; ibandronate; 
camptothecin-1 1 (CPT-1 1); topoisomerase inhibitor RPS 2000; difluoromemylornithine (DMFO); retinoic 
acid; esperamicins; capecitabine; and pharmaceutically acceptable salts, acids or derivatives of any of the 
above. Also included as suitable chemotherapeutic cell conditioners are anti-hormonal agents that act to 
regulate or inhibit hormone action on tumors such as anti-estrogens including for example tamoxifen, 
raloxifene, aromatase inhibiting 4(5)-irriidazoles, 4-hydroxytamoxifen, trioxifene, keoxifene, LY 117018, 
onapristone, and toremifene (Fareston); and anti-androgens such as flutamide, nilutamide, bicalutamide, 
leuprolide, goserelin, doxorubicin, daunomycin, duocarmycin, vincristin, and vinblastin, 
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[00188] Other drugs that can be used as the effector modules include those that are useful for treating inflammatory 
conditions, cardiac diseases, infectious diseases, respiratory diseases, autoimmune diseases, neronal and 
muscular disorders, metabolic disorders, and cancers. 

[00189] Additional drugs that can be used as the effectors in MURPs include agents for pain and inflammation such 
as mstamine and mstarnine antagonists, bradykinin and bradykinin antagonists, 5-hydxoxytryptamine 
(serotonin), lipid substances that are generated by biotransformation of the products of the selective 
hydrolysis of membrane phospholipids, eicosanoids, prostaglandins, thromboxanes, leukotrienes, aspirin, 
nonsteroidal anti-inflammatory agents, analgesic-antipyretic agents, agents that inhibit the synthesis of 
prostaglandins and thromboxanes, selective inhibitors of the inducible cyclooxygenase, selective inhibitors 
of the inducible cyclooxygenase-2, autacoids, paracrine hormones, somatostatin, gastrin, cytokines that 
mediate interactions involved in humoral and cellular immune responses, lipid-derived autacoids, 
eicosanoids, /3-adrenergic agonists, ipratropium, glucocorticoids, methylxanthines, sodium channel 
blockers, opioid receptor agonists, calcium channel blockers, membrane stabilizers and leukotriene 
inhibitors. 

[00190J Other drugs that can be used as effector include agents for the treatment of peptic ulcers, agents for the 

treatment of gastroesophageal reflux disease, prokinetic agents, antiemetics, agents used in irritable bowel 
syndrome, agents used for diarrhea, agents used for constipation, agents used for inflammatory bowel 
disease, agents used for biliary disease, agents used for pancreatic disease. 

[00191] Radionuclides - MURPs can be designed for the tissue-targeted delivery of radionuclides as well as for 

imagin with radionuclides. URPs are ideal for imaging because the halflife can be optimized by changing 
the length of the URP. For most imaging applications a moderately long URP is likely to be preferred, 
providing a halflife of 5 minutes to a few hours, not days or weeksMURPs can be designed such that they 
only contain a single or a small defined number of amino groups that can be modified with chelating agents 
(such as DOTA) for radio isotopes such as technetium, indium, yttrium, (EXPAND). Alternative methods 
of conjugation are through reserved cysteine side chains. Such radionuclide-carrying MURPs can be 
employed for the treatment of tumors or other diseased tissues, as well as for imaging. 

[00192] Many pharmaceutical^ active proteins or protein domains can used as effector models in MURPs. 

Examples are the following proteins as well as fragments of these proteins: cytokines, growth factors, 
enzymes, -receptors, rnicroproteins, hormones, erythopoetin, adenosine deaminase, asparaginase, arginase, 
interferon, growth hormone, growth hormone releasing hormone, G-CSF, GM-CSM, insulin, hirudin, TNF- 
receptor, uricase, rasburicase, axokine, RNAse, DNAse, phosphatase, pseudomonas exotoxin, ricin, 
gelonin, desmoteplase, laronidase, thrombin, blood clotting enzyme, VEGF, pro tropin, somatropin, 
alteplase, interieukin, factor IT/, factor VIII, factor X, factor IX, dornase, glucocerebrosidase, follitropin, 
glucagon, thyrotropin, nesiritide, alteplase, teriparatide, agalsidase, laronidase, methioninase. 

[001931 Protease-activated MURPs: To enhance the therapeutic index of an effector module, one can insert 

protease-labile sequences into URP sequences that are sensitive to proteases that are preferentially found in 
serum or in the target tissue to be treated by the MURP. This approach is illustrated in figure 9. Some 
designs allows one to construct proteins that are selectively activated when reaching a target tissue. Of 
particular interest are MURPs that are activated at a disease site. To facilitate such target-specific 
activation one can attach URP sequences in close proximity to the active site or receptor binding site of the 
effector module such that the resulting fusion protein has limited biological activity. Of particular interest 
is the activation of an effector module at a tumor site. Many tumor tissues express proteases in relatively 
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high concentrations and sequences that are specifically cleaved by these tumor proteases can be inserted 
into URP sequences. For example, most prostate tumor tissues contain high concentrations of prostate 
specific antigen (PSA) which is a serine protease. Prodrugs consisting of a PSA-labile peptide conjugated 
to the cancer drug doxorubicin have shown selective activation in prostate tissue [DcFeo-Jones, D., et aL 
(2000) Nat Med, 6: 1248]. Of particular interest for disease-specific activation are proteins with cytostatic 
or cytotoxic activity like TNFalpha, and many cytokines and interleukins. Another application is the 
selective activation of proteins at the site of inflammation or at site of virus or bacterial infection. 
(00194 J Methods of production - MURPs containing URP sequences can be produced using molecular biology 

approaches that are well know in the art. A variety of cloning vectors are available for various expression 
systems like marnrnalian cells, yeast, and microbes. Of particular interest as expression hosts are E. coli, S. 
cerevisiae, P. pastoris, and Chinese hamster ovary cells. Of particular interest are hosts that have been 
optimized to widen their codon usage. Of particular interest is a host that has been modified to enhance 
expression of GRS. That can be done by providing DNA that encodes glycine-specific tRNAs. In 
addition, one can engineer the host such that loading of glycine-specific tRNAs is enhanced. The DNA 
encoding the enhanced protein can be operationally linked to a promoter sequences. The DNA encoding 
the enhanced protein as well as the operationally linked promoter can be part of a plasmid vector, viral 
vector or it can be inserted into the chromosome of the host. 
[00195] For production on can culture the host under conditions that facilitate the production of the enhanced 

protein. Of particular interest are conditions that improve the production of GRS. 
[00196] The subject MURPs can adopt a variety of formats. For instance, the MURPs can contain URPs that are 
fused to pharmaceutically active proteins to produce slow-Telease products. Such products can be injected 
or implanted locally for instance into or under the skin of a patient. Due to its large hydrodynainic radius 
the URP sequences-containing product is slowly released from the injection or implantation site which 
leads to a reduction of the frequency of injection or implantation. The URP sequences can be designed to 
contain regions that bind to cell surfaces or tissue in order to prolong the local retention of the drug at the 
injection site. Of particular interest are URP-containing products that can be formulated as soluble 
compounds but form aggregates or precipitates upon injection. This aggregation or precipitation can be 
triggered by a change in pH between the formulated product and the pH at the injection site. Alternatives 
are URP-containing products that precipitate or form aggregates as a result of a change in redox conditions. 
Yet another approach is a URP-containing product that is stabilized in solution by addition of non-active 
solutes, but that precipitates or aggregates upon injection as a result of diffiision of the solubilizing solutes. 
Another approach is to design URP-containing products that contain one or multiple Lysine or Cysteine 
residues in their URP sequence and that can be cross-linked prior to injection. 
[00197] Where desired, the MURP is monomeric (here meaning not-crossiinked) when manufactured and 

formulated and when injected, but after subcutaneous injection the protein starts to crosslink with itself or 
with native human proteins, forming a polymer under the skin from which active drug molecules are freed 
only very gradually. Such release can be by disulfide bond reduction or disulfide shuffling as illustrated in 
Fig. 1 8, or it can be mediated by proteolysis as shown in Fig. 19, releasing active fragments into the 
circulation. It is important that these active fragments are large enough to have a long halflife, because the 
longer their secretion halflife, the lower the dose of the released protein can be, allowing the use of a lower 
dose of product to be injected or a longer time between injections. 
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[001981 One approach that offers these advantages is disulfide-mediated crosslinking of proteins. For example, a 
protein drug would be manufactured with a cyclic peptide in it (one or more). This cyclic peptide may or 
may not be involved in binding to the target. This protein is manufactured with the cyclic peptide formed, 
ie in oxidized form, to simplify purification. However, the product is then reduced and formulated to keep 
the protein in reduced form. It is important that the cyclic peptide reduces at a low concentration of 
reducing agent, such as .0.25, 0.5, 1.0, 2.0, 4.0 or 8.0 raM Dithiothreitol or Betamercaptoethanol or 
cysteine or equivalent reducing agent, so that the cyclic peptide can be reduced without reducing other 
disulfide containing protein modules in the product. The use of FDA approved reducing agents is preferred, 
such as cysteine or glutathione. After subcutaneous injection, the low molecular weight reducing agent 
diffuses away rapidly or is neutralized by human proteins, exposing the drug to an oxidizing environment 
while it is still at a high molar concentration, which causes crosslinking of cysteines located on different 
protein chains, which leads to polymerization of the drug at the injection site. The longer the distance 
between the cysteines in the cyclic peptide, and the higher the concentration of the drug, the higher the 
degree of polymerization of the drug will be, since polymerization competes with cyclic peptide 
reformation. Over time, disulfide reduction and oxidation will cause disulfide reshuffling, which will lead 
to cyclic peptide reformation and monomerization and resolubilization of the drug. The release of the drug 
from the polymer can also occur via proteolysis which could be targeted and controlled or increased by 
building in cleavage sites for serum proteases. The crosslinking ot the proteins could also be performed 
with a chemical protein-protein crosslinking agent, such as the ones listed in [table x]. Ideally, this is an 
already FDA-approved agent, such as those used for vaccine conjugation or conjugation of chemicals to 
proteins. 

[001991 Instead of using disulfides, one can also stabilize proteins against proteolytic degradation using a wide 
variety of crosslinking agents. Most of the agents below are sold by Pierce Chemicals under that same 
name and instructions for their use are available online (www.piercenet.com). The agents that result in the 
same chain-to-chain distance as obtained with disulfides are the most likely to be useful for this application. 
The short-linker agents such as DFDNB are the most promising. The interchain distance can be readily 
determined from the structures of the chemicals as shown in www.piercenet.com. 

[00200] There are a large number of specific chemical products that work based on the following small number of 
basic reaction schemes, all of which are described in detail at www.piercenet.com. Examples of useful 
crosslinking agents are Irnidoesters, active halogens, maleimide, pyridyl disulfide, NHS-ester. 
Homobifunctional crosslinking agents have two identical reactive groups and are often used in a onestep 
chemical crosslinking procedure. Examples are BS3 (a non-cleavable water-soluble DSS analog), 
BSOCOES (base-reversible), DMA (Dimethyl adipirnidate-2HCl), DMP (Dimethyl pimelimidate-2HC!), 
DMS (Dimethyl suberimidate-2HCl), DSG (5-carbon analog of DSS), DSP (Lomant's reagent), DSS (non- 
cleavable), DST (cleavable by oxidizing agents), DTBP (Dimethyl 3,3'-ditmobispropiornmidate-2HCl), 
DTSSP, EGS, Sulfo-EGS, THPP, TSAT, DFDNB (l,5-Difluoro-2,4-dinitrobenzene) is especially useful 
for crosslinking between small spacial distances (Kornblatt, J. A. and Lake, D.F. (1980). Cross-linking of 
cytochrome oxidase subunits with difluorodinitrobenzene. Cany. Biochem. 58, 219-224). 

[00201 J Suifhydryl-reactive homobifunctional crosslinking agents are homobifunctional protein crosslinkers that 
react with sulfhydryls are often based on maleimides, which react with -SH groups at pH 6.5-7.5, forming 
stable thioether linkages. BM[PEO]3 is an 8-atom polyether spacer that reduces potential for conjugate 
precipitation in sulfydryl-to-sulfhydryl cross-linking applications. BM[PEO]4 is similar but with an 11- 
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atom spacer. BMB is a non-cleavable crosslinker with a four-carbon spacer. BMDB makes a linkage that 
can be cleaved with periodate. BMH is a widely used hombbifunctional sulfhydryl-reactive crosslinker. 
BMOE has an especially short linker. DPDPB and DTME are cleavable crosslinkers. HVBS does not have 
the hydrolysis potential of meleimides. TMEA is another option. Hetero-bifunctional crossiinking agents 
5 have two different reactive groups. Examples are NHS-esters and arrnnes/hydrazines via EDC activation, 

AEDP, ASB A (photoreactive, iodinatable), EDC (water-soluble carbodiimide). Amine-Sulfhydryl reactive 
birunctional crosslinkers are AMAS, APDP, BMPS, EMCA, EMCS, GMBS, KMUA, LC-SMCC, LC- 
SPDP, MBS, SBAP, SIA (extra short), SLAB, SMCC, SMPB, SMPH, SMPT, SPDP, Sulfo-EMCS, Sulfo- 
GMBS, Sulfo-KMUS, Sulfo-LC-SMPT, Sulfo-LC-SPDP, Sulfo-MBS, Sulfo-SIAB, Sulfo-SMCC, Sulfo- 
10 SMPB. Amino-group reactive heterobifunctional crossiinking agents are ANB-NOS, MSA, NHS- AS A, 

SADP, SAED, SAND, SANPAH, SASD, SFAD, Sulfo-HSAB, Sulfo-NHS-LC-ASA, Sulfo-SADP, Sulfo- 
SANPAH, TFCS. 

[00202] A different slow release format has the drug labeled with a His6 tag, which is mixed and co-injected with 
Nickel-Nitxilotriacetic acid-conjugated beads (Ni-NTA beads), a GMO version of the ones that are 
15 available from Qiagen. The drug would slowly teach off the beads, providing depot and slow release as 

illustrated in Fig. 20. The beads are optional and can be replaced by a crosslinked, polymeric Nickel- 
nitrilotriacetic acid that leads to assembly of an even larger polymer. 
[00203] URP sequences can contain sequences that are known to form multimers like aipha2D [Hill, R., et aL 

(1998) J Am Chem Soc, 120: 1 138-1 145] that was utilized to dimerize an antibody fragment [Kubetzko, S., 

20 et al. (2005) Mol Pharmacol, 68: 1439-54]. Examples of a useful homo dimerization peptide is the 

sequence SKVILFE. An example of useful heterodimerization sequences are the peptide ARARAR that 
can form dimers with the sequence DAD ADA and related sequences. Multimerization can improve the 
biological function of a molecule by increasing its avidity and it can influence pharmacokinetic properties 
and tissue distribution of the resulting MURPs. 

25 [00204] "Multimerization modules" are amino acid sequences that facilitate dimer or multimer formation of 

MURPs. Multimerization modules may bind to themselves to form dimers or multimers. Alternatively, 
multimerization modules can bind to other modules of the MURP. These can be leucine zippers or small 
peptides like Hydra head activator derivatives (SKVILF-like) which forms antiparallel homopolymers, or 
peptides like RARARA and DAD ADA, which form high affinity antiparallel heteropolymers. Using one, 

30 two or more copies of these peptides one can force the formation of protein dimers, linear multimers or 

branched multimers. 

[00205] The affinity of the association can be tailored by changing the type, length and coinposition of the peptides. 
Some applications require peptides that form homodimers as illustrated in Fig. 21. Other applications 
require heterodimers. In some cases, once associated, the peptides can be locked into place by forming 

35 disulfide bonds between the two protein chains, typically on either side of the peptides. Multimerization 

modules are useful for linking two MURP molecules together (head to tail, head to head, or tail to tail) as 
illustrated in Fig. 21 . The multimerization modules can be located on either the N- or C-terminus in order 
to form dimers. If the multimerization modules are present at both termini, long, linear multimers will be 
formed. If more than two multimerization modules are present per protein, branched polymeric networks 

40 can be formed. The concepts of multimerization and chemical conjugation can be combined leading to 

useful for halflife extension and depot formation, leading to slow release of active drug from the depot or 
injection site as illustrated in Fig. 23. 
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[00206] The subject MURPs can incorporate a genetic or universal URP. One approach is to express a URP 

containing a long URP module, which provides halflife arid contains multiple (typically 4-10) lysines ( or 
other sites) that allows site-specific conjugation of peptides (ie linear, cyclic, 2SS, 3 SS, etc) that bind to a 
specific target. The advantage of this approach is that the URP module is generic and can be conjugated 
5 with any target-specific peptide. Ideally the linkage of the target-specific peptide to the URP is a directed 

linkage, so that residues on the URP can only react with a residue on the target-specific peptide and 
exhaustive coupling can only produce a single species, which is a URP that is linked to a peptide at every 
lysine, for example. This complex behaves like a high-avidity multimer in it's binding properties but is 
simple to manufacture. This approach is illustrated in Fig. 24. 
10 [00207] The subject MURPs can also incorporate URPs to effect delivery across tissue barriers. URPs can be 

engineered to enhance delivery across the dermal, oral, buccal, intestinal, nasal, blood-brain, pulmonary, 
thecal, peritoneal, rectal, vaginal or many other tissue barriers. 
[00208] One of the key obstacles to oral protein delivery is the sensitivity of most proteins to proteases in the 

digestive system. Conjugation to URP sequences can improve protease resistance of pharmaceutically 
15 active proteins and thus facilitate their uptake. It has been shown that protein uptake in the digestive 

system can be improved by adding molecular carriers. The main role of these carriers is an improvement of 
membrane permeability [Stoll, B. R., et al. (2000) J Control Release, 64: 217-28]. Thus one can include 
sequences into URP sequences that improve membrane permeability. Many sequences that improve 
membrane permeability are know and examples are sequences rich in arginine [Takenobu, T. ? et al. (2002) 
20 Mo1 Cancer Ther, 1: 1043-9]. Thus one can design URP sequences that improve cellular or oral uptake of 

proteins by combining two functions, a reduction in proteolytic degradation of the protein of interest as 
well as an increase in membrane permeability of the fusion product. Optional, on can add a sequence to the 
URP sequence that is sensitive to a protease that is preferentially located at in the target tissue for the drug 
of interest but is stable to proteases in the digestive tract. Examples of such URP sequences are sequences 
25 that contain long regions of GRS as well as sequences that are rich in basic amino acids in particular 

arginine and facilitate membrane transfer. URP can be utilized in a similar way to improve protein uptake 
via intranasal, intrapulmonary, or other routes of delivery. 
Specific product examples: 
[00209] DR4/DR5 agonist - DR4 and DR5 are death receptors that are expressed on many tumor cells. These 
30 receptors can be triggered by trimerization which leads to cell death and tumor regression. Binding 

domains with specificity for DR4 or DR5 can be obtained by phage panning or other display methods: 
These DR4 or DR5-specific binding domains can be multimerized using URP modules as linkers as 
illustrated in figure 12. Of particular interest are MURPs that contain three or more binding modules with 
specificity for DR4 or DR5 or both. As illustrated in Figure 12, MURPs can contain additional binding 
35 modules with sepecificity for tumor antigens that are overexpressed in tumor tissues. This allows one to 

construct MURPs that specifically accumulate in tumor tissue and trigger cell death. MURPs can contain 
modules that bind either DR4 or DR5. Of particular interest are MURPs that contain binding modules that 
bind both DR4 and DR5. 

[00210] Tumor-targeted Interleukin 2- Interleukin 2 (IL2) is a cytokine that can enhance me immune response to 
40 tumor tissue However, systemic IL2 therapy is characterized by significant side effects. MURPs can be 

constructed that combine binding domains with specificity for tumor antigens and IL2 as effector module 
as illustrated in Figure 13. Such MURP can selectively accumulate in tumor tissue and thus elicit a tumor- 
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selective immune response while minimizing the systemic side effects of cytokine therapy. Such MURPs 
can target a variety of tumor antigens like EpCAM, Her2, CEA, EGFR, Thomsen Friedenreich antigen. Of 
particular utility are MURPs that bind to tumor antigens that show slow internalization. Similar MURPs 
can be designed using other cytokines or tumor necrosis factor-alfa as effector modules. 
[00211] Tumor-selective asparaginase - Asparaginase is used to treat patients with acute leukemia. Both 

asparaginase from E. coli and asparaginase from Erwinia are used for treatment Both enzymes can lead to 
immunogenicity and hypersensitive reactions. Oncaspar is PEGylated version of asparaginase that has 
reduced immunogenicity. However, the protein is difficult to manufacture and administered as a mixture of 
isomers. Adding URP sequences to termini and/or to internal loops allows the direct recombinant 
manufacture of an asparaginase variant that is homogeneous and has low immunogenicity. Various URP 
sequences and attachment sites can be compared to determine the optimum position for URP sequence 
attachment. Several other enzymes can degrade amino acids have reported antitumor activity. Examples 
are arginase, methioriinase, phenylalanine ammonia lyase, and tryptophanase. Of particular interest is the 
phenylalanine ammonia lyase of streptomyces maritimus, which has a high specific activity and does not 
require a co-factor [Calabrese, J. C. } et al. (2004) Biochemistry, 43: 1 1403-16]. Most of these enzymes are 
of bacterial or other non-human origin and are likely to elicit immune reactions. The immunogenicity of 
these enzymes can be reduced by adding one or more URP sequences. In addition, the therapeutic index 
and PK properties of these enzymes can be improved by increasing their hydrodynamic radius as a result of 
URP sequences attachment. 

[00212] The subject MURPs can be designed to target any cellular proteins. A non-limiting list is provided below. 

f00213] VEGF, VEGF-R1, VEGF-R2, VEGF-R3, Her-1, Her-2, Her-3, EGF-1, EGF-2, EGF-3, Alpha3, cMet, 

ICOS, CD40L, LFA-1, c-Met ICOS, LFA-1, IL-6, B7. 1, B7.2, OX40, IL-lb>. TACI, IgE, BAFF or BLys, 
TPO-R, CD19, CD20, CD22, CD33, CD28, IL-1-R1, TNFcc, TRAIL-R1, Complement Receptor 1, FGFa, 
Osteopontin, Vitronectin, Ephrin A1-A5, Ephrin B1-B3, alpha-2-macroglobulin, CCL1, CCL2, CCL3, 
CCL4, CCL5, CCL6, CCL7, CXCL8, CXCL9, CXCL10, CXCL11, CXCL12, CCL13, CCL14, CCL15, 
CXCL16, CCL16, CCL17, CCL18, CCL19, CCL20, CCL21, CCL22, PDGF, TGFb, GMCSF, SCF, p40 
(IL12/IL23), ILlb, ILla, ILlra, IL2, IL3, IL4, ILS, IL6, IL8, IL10, IL12, IL15, IL23, Fas, FasL, Flt3 
ligand, 41BB, ACE, ACE-2, KGF, FGF-7, SCF, Netrinl ,2, IFNa,b 5 g, Caspase2,3,7,8,l 0, ADAM 
S 1 , S5, 8,9, 1 5,TS 1 ,TS5 ; Adiponectin, ALCAM, ALK-1, APRIL, Annexin V, Angiogenin, Amphiregulin, 
Angiopoie tin 1,2,4, B7-1/CD80, B7-2/CD86, B7-HI, B7-H2, B7-H3, Be 1-2, BACE-1, BAK, BCAM, 
BDNF, bNGF, bECGF, BMP2,3,4,5,6,7,8; CRP, Cadherin6,8,l 1 ; Cathepsin A,B,QD > E I L,S,V I X; 
GDI la/LFA~l, LFA-3 , GP2b3a, GH receptor, RSV F protein, IL-23 (p40 9 pl9), IL-12, CD80, CD86, 
CD28, CTLA-4, a4(31,a4p7, TNF/Lymphotoxin 5 IgE, CD3, CD20, IL-6 , IL-6R, BLYS/BAFF, IL-2R, 
HER2, EGFR, CD33, CD52, Digoxin, Rho (D), Varicella, Hepatitis, CMV, Tetanus, Vaccinia, Antivenom, 
Botulinum, Trail-Rl, Trail-R2, cMet, TNF-R family, such as LA NGF-R, CD27, CD30, CD40, CD95, 
Lymphotoxin a/b receptor, Wsl-1, THA/TNFSF15, BAFF, BAFF-R/TNFRSF13C, TRAIL 
R2/TNFRSF10B, TRAIL R2/TNFRSF10B, Fas/TNFRSF6 CD27/TNFRSF7, DR3/TNFRSF25, 
H VEM/TNFRSF 1 4, TRO Y/TNFRSF 1 9, CD40 Ligand7TOFSF5, BCMA/TNFRSF 1 7, CD3 0/TNFRSF8, 
LIGHT/TNFSF 14, 4-1BB/TNFRSF9, CD40/TNFRSF5, GITR/TNFRSF 1 8, Osteoprotegerin/TNTRSFl IB, 
RANK/TNFRSF 1 1 A, TRAIL R3/TNFRSF1 0C, TRAIL/TNFSF 1 0, TRANCE/RANK L/TNFSF1 1, 4-1 BB 
Ligand/TNFSF9, T WEAK/TNFSF 1 2, CD40 Ligand/TNFSFS, Fas Ligand/TNFSF6, RELT/TNFRSF19L, 
APRII7TNFSF13 , DcR3/TNFRSF6B, TNF RI/TNFRSF 1 A, TRAIL R1/TNFRSF10A, TRAIL 
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R4/TMFRSF1 0D, CD30 Ligand/TNFSF8, GITR LiganoVTNFSFl 8, TNFSF1 8, TACI/TNFRSFI3B, NGF 
R/TNFRSF16, OX40 Ligand/TNFSF4, TRAIL R2/TNFRSF 1 OB, TRAIL R3/TNFRSF10C, TWEAK 
R/TNFRSF12, BAFF/BLyS/TNFSFl 3, DR6/TNFRSF2 1 , TOT-alpha/TNFSFl A, Pro-TNF- 
alpha/TNFSFl A, Lymphotoxin beta R/TNFRSF3, Lymphotoxin beta R (LTbR)/Fc Chimera, TNF 
RI/TNFRSF 1 A, TNF-beta/TNFSF 1 B, PGRP-S, TNF RI/TNFRSF1A, TNF RII/TNFRSF 1 B , EDA-A2, 
TNF^alpha/TNFSFl A, EDAR, XEDAR, TNF RI/TNFRSF 1 A. 
f00214] Of particular interest are human target proteins that are commercially available in purified form. Examples 
are: 4EBP1, 14-3-3 zeta, 53BP1, 2B4/SLAMF4, CCL21/6Ckine, 4-1BB/TNFRSF9, 8D6A, 4-1BB 
Ligand/TNFSF9, S-oxo-dG, 4 -Amino- 1 ,8-naphthalirnide, A2B5, Aminopeptidase LRAP/ERAP2 , A33, 
Aminopeptidase N/ANPEP, Aag, Aminopeptidase P2/XPNPEP2, ABCG2, Aminopeptidase P1/XPNPEP1, 
ACE, Aminopeptidase PILS/ARTS1, ACE-2, Amnionless, Actin, Amphiregulin, beta-Actin, AMPK alpha 
1/2, Activin A, AMPK alpha 1, Activin AB, AMPK alpha 2, Activin B, AMPK beta 1, Activin C, AMPK 
beta 2, Activin RIA/ALK-2, Androgen R/NR3C4, Activin RIB/ALK-4, Angiogenic Activin RIIA, 
Angiopoietin-1, Activin RIIB, Angiopoietin-2, ADAM8, Angiopoietin-3, ADAM9, Angiopoietin-4, 
ADAM 10, Angiopoietin-like 1, ADAM 12, Angiopoietin-like 2, ADAM 15, Angiopoietin-like 3, 
TACE/AD AM 1 7 , Angiopoietin-like 4, ADAM 19, Angiopoietin-like 7/CDT6, ADAM33, Angiostatin, 
ADAMTS4, Annexin AI/Annexin I, ADAMTS5, Annexin A7, ADAMTS1, Annexin A 10, ADAMTSL- 
1/Punctin, Annexin V, Adiponectin/Acrp30, ANP, AEBSF, AP Site, Aggrecan, APAF-1, Agrin, APC, 
AgRP, APE, AGTR-2, APJ, AIF, APLP-1, Akt, APLP-2, Aktl, Apolipoprotein AI, Akt2, Apolipoprotein 
B, Akt3, APP, Serum Albumin, APRIL/TNFSF 1 3 , ALCAM, ARC, ALK-1, Arternin, ALK-7, 
Arylsulfatase A/ARSA, Alkaline Phosphatase, ASAH2/N-acylsphingosine Amidohydrolase-2, alpha 2u- 
Globulin, ASC, alpha-l-Acid Glycoprotein, ASGR1, alpha-Fetoprotein, ASK1, ALS, ATM, Ameloblastic 
ATRIP, AMICA/JAML, Aurora A, AMIGO, Aurora B, AMIG02, Axin-1, AMIG03, Axl, 
Aminoacylase/ACYl, Azurocidin/CAP37/HBP, Aminopeptidase A/ENPEP, B4GALT1, BIM, B7-1/CD80, 
6-Biotin-17-NAD, B7-2/CD86, BLAME/SLAMF8, B7-H1/PD-L1, CXCL1 3/BLC/BCA- 1 , B7-H2, 
BLIMP1, B7-H3, Blk, B7-H4, BMI-1, BACE-1, BMP-1/PCP, BACE-2, BMP-2, Bad, BMP-3, 
BAFF/TNFSF 1 3B, BMP-3b/GDF-10, BAFF R/TNFRSFI3C, BMP-4, Bag-1, BMP-5 , BAK, BMP-6, 
BAMBI/NMA, BMP-7, BARD1, BMP-8, Bax, BMP-9, BCAM, BMP- 10, Bel- 10, BMP- 1 5/GDF-9B, Bcl- 
2, BMPR-IA/ALK-3, Bcl-2 related protein A 1, BMPR-IB/ALK-6, Bcl-w, BMPR-II, Bcl-x, BNIP3L, Bcl- 
xL, BOC, BCMA/TNFRSF17, BOK, BDNF, BPDE, Benzamide, Brachyury, Common beta Chain, B-Raf, 
beta IG-H3, CXCL 1 4/BRAK, Betacellulin, BRCA1, beta-Defensin 2, BRCA2, BID, BTLA, Biglycan, 
Bub-1, Bik-like Killer Protein, c-jun, CD90/Thyl, c-Rei, CD94, CCL6/C10, CD97, Clq R1/CD93, CD151, 
ClqTNFl, CD160, ClqTNF4, CD163, ClqTNFi, CD164, Complement Component Clr, CD200, 
Complement Component Cls, CD200 Rl, Complement Component C2, CD229/SLAMF3, Complement 
Component C3a, CD23/Fc epsilon RII, Complement Component C3d, CD2F- 1 0/SLAMF9, Complement 
Component C5a, CD5L, Cadherin-4/R-Cadherin, CD 69, Cadherin-6, CDC2, Cadherin-8, CDC25A, 
Cadherin-1 1, CDC25B, Cadherin-12, CDCP1, Cadherin-I3, CDO, Cadherin-17, CDX4, E-Cadherin, 
CEACAM-l/CD66a, N-Cadherin, CEACAM-6, P-Cadherin, Cerberus 1, VE-Cadherin, CFTR, Calbindin 
D, cGMP, Calcineurin A, Chem R23, Calcineurin B, Chemerin, Calreticulin-2, Chemokine Sampler Packs, 
CaM Kinase H, Chitinase 3-like 1, cAMP, Chitotriosidase/CHITl, Cannabinoid Rl, Chkl, Cannabinoid 
R2/CB2/CNR2, Chk2, CAR/NR1I3, CHL-1/L1CAM-2, Carbonic Anhydrase I, Choline 
Acetyltransferase/ ChAT, Carbonic Anhydrase II, Chondrolectin, Carbonic Anhydrase III, Chordin, 
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Carbonic Anhydrase IV, Chordin-Like 1, Carbonic Anhydrase VA, Chordin-Like 2, Carbonic Anhydrase 
VB, CINC-1, Carbonic Anhydrase VI, CINC-2, Carbonic Anhydrase VII, CINC-3, Carbonic Anhydrase 
VIII, Claspin, Carbonic Anhydrase IX, Claudin-6, Carbonic Anhydrase X, CLC 9 Carbonic Anhydrase 
XII, CLEC-1, Carbonic Anhydrase XIU, CLEC-2, Carbonic Anhydrase XIV, CLECSF13/CLEC4F, 
Carboxymethyl Lysine, CLECSF8, Caxboxypeptidase A1/CPA1, CLF-1, Carboxypeptidase A2, CL- 
P1/COLEC12, Carboxypeptidase A4, Clusterin, Carboxypeptidase Bl, Clusterin-like 1. Carboxypeptidase 
E/CPE, CMG-2, Carboxypeptidase XI , CMV UL146, Cardiotrophin- 1 , CMV UL147, Carnosine 
Dipeptidase 1, CNP, Caronte, CNTF, CART, CNTF R alpha, Caspase, Coagulation Factor IlyThrombin, 
Caspase-1, Coagulation Factor Ill/Tissue Factor, Caspase-2, Coagulation Factor VII, Caspase-3, 
Coagulation Factor X, Caspase-4, Coagulation Factor XI, Caspase-6, Coagulation Factor XlV/Protein C, 
Caspase-7, COCO, Caspase-8, Cohesin, Caspase-9, Collagen I, Caspase- 10, Collagen U, Caspase- 12, 
Collagen IV, Caspase-1 3, Common gamma Chain/IL-2 R gamma, Caspase Peptide Inhibitors, 
COMP/Thrombospondin-5, Catalase, Complement Component ClrLP, beta-Catenin, Complement 
Component ClqA, Cathepsin I, Complement Component ClqC, Cathepsin 3, Complement Factor D, 
Cathepsin 6, Complement Factor I, Cathepsin A, Complement MASP3, Cathepsin B, Connexin 43, 
Cathepsin C/DPPI, Contactin-i, Cathepsin D, Contactin-2/TAGl, Cathepsin E, Contactin-4, Cathepsin F, 
Contactin-5, Cathepsin H, Corin, Cathepsin L, Cornulin, Cathepsin O, CORS26/ClqTNF,3, Cathepsin S, 
Rat Cortical Stem Cells, Cathepsin V, Cortisol, Cathepsin X/Z/P, COUP-TF I/NR2F1, CBP, COUP-TF 
H/NR2F2, CCI, COX-1, CCK-A R, COX-2, CCL28, CRACC/SLAMF7, CCR1, C-Reactive Protein, 
CCR2, Creatine Kinase, Muscle/CKMM, CCR3, Creatinine, CCR4, CREB, CCR5 ? CREG, CCR6, 
CRELD 1 , CCR7, CRELD2, CCR8, CRHBP, CCR9, CRHR-1, CCR10, CRIM1, CD 155/PVR, Cripto, 
CD2, CRISP-2, CD3, CRISP-3, CD4, Crossveinless-2, CD4+/45RA-, CRTAM, CD4+/45RO-, CRTH-2, 
CD4+/CD62L-/CD44, CRY1, CD4+/CD62L+/CD44, Cryptic, CD5, CSB/ERCC6, CDS, CCL27/CTACK, 
CD8, CTGF/CCN2, CD8+/45RA-, CTLA-4, CD8+/45RO-, Cubilin, CD9, CX3CR1 , CD 14, CXADR, 
CD27/TNFRSF7, CXCL16, CD27 Ligand7TNFSF7, CXCR3, CD28, CXCR4, CD3 0/TNFRSF8 , CXCR5, 
CD30 Ligand/TNFSF8, CXCR6, CD3 1/PECAM-l , Cyclophilin A, CD34, Cyr61/CCN1, CD36/SR-B3, 
Cystatin A, CD38, Cystatin B, CD40/TNFRSF5, Cystatin C, CD40 LigandVTNFSF5, Cystatin D, CD43, 
Cystatin E/M, CD44, Cystatin F, CD45, Cystatin H, CD46, Cystatin H2, CD47, Cystatin S, 
CD48/SLAMF2, Cystatin SA, CD55/DAF, Cystatin SN, CD58/LFA-3, Cytochrome c, CD59, 
Apocytochrome c, CD68, Holocytochrome c, CD72, Cytokeratin 8, CD74, Cytokeratin 14, CD83, 
Cytokeratin 19, CD84/SLAMF5, Cytonin, D6, DISP1, DAN, Dkk-1, DANCE, Dkk-2, DARPP-32, Dkk-3, 
DAX1/NR0B1, Dkk-4, DCC, DLEC, DC1R/CLEC4A, DLL I, DCAR, DLJL4, DcR3/TNFRSF6B, d- 
Luciferin, DC-SIGN, DNA Ligase IV, DC-SIGNR/CD299, DNA Polymerase beta, DcTRAIL 
R1/TNFRSF23, DNAM-1, DcTRAIL R2/TNFRSF22, DNA-PKcs, DDR1, DNER, DDR2, Dopa 
Decarboxylase/DDC, DEC-205, DPCR-1, Decapentaplegic, DPP6, Decorin, DPPA4, Dectin-1/CLEC7A, 
DPPA5/ESG1, Dectin-2/CLEC6A, DPPII/QPP/DPP7, DEP-1/CD148, DPPIV/CD26, Desert Hedgehog, 
DR3/TNFRSF25, Desmin, DR6/TNFRSF2 1 , Desmoglein-1, DSCAM, Desmoglein-2, DSCAM-L1, 
Desrnoglein-3, DSPG3, Dishevelled- 1, Dtk, Dishevelled-3, Dynamin, EAR2/NR2F6, EphA5, ECE-1, 
EphA6, ECE-2, EphA7, ECF-L/CHI3L3, EphA8, ECM-1, EphBl, Ecotin, EphB2, EDA, EphB3, EDA-A2, 
EphB4, EDAR, EphB6, EDG-1, Ephrin, EDG-5, Ephrin-Al, EDG-8, Ephrin-A2, eEF-2, Ephrin-A3, EGF, 
Ephrin-A4, EGF R, Ephrin-A5, EGR1, Ephrin-B, EG-VEGF/PK 1 , Ephrin-Bl, eIF2 alpha, Ephrin-B2, 
eIF4E, Ephrin-B3, Elk-1, Epigen, EMAP-II, Epimorphin/Syntaxin 2, EMMPRIN/CD 147, Epiregulin, 
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CXCL5/ENA, EPR-l/Xa Receptor, Endocan,*ErbB2, Endoglin/CD105, ErbB3, Endoglycan, ErbB4, 
Endonuclease III, ERCC1, Endonuclease IV, ERCC3, Endonuclease V, ERK1/ERK2, Endonuclease VIII, 
ERK1, Endorepellin/Perlecan, ERK2, Endostatm, ERK3, Endothelin-1, ERK5/BMK1, Engrailed-2, ERR 

» 

aIpha/NR3Bl, EN-RAGE, ERR beta/NR3B2, Enteropeptidase/Enterokinase, ERR gamrna/NR3B3, 
CCL1 1/Eotaxin, Erythropoietin, CCL24/Eotaxin-2, Erythropoietin R, CCL26/Eotaxin-3, ESAM, 
EpCAM/TROP-1, ER alpha/NR3Al, EPCR, ER beta/NR3A2, Eph, Exonuclease III, EphAl, Exostosin- 
like 2/EXTL2, EphA2, Exostosin-like 3/EXTL3, EphA3, FABP1, FGF-BP, FABP2, FGF Rl-4, FABP3, 
FGF Rl, FABP4, FGF R2, FABP5, FGF R3, FABP7, FGF R4, FABP9, FGF R5, Complement Factor B, . 
Fgr, FADD, FHR5, FAM3A, Fibronectin, FAM3B, Ficolin-2, FAM3C, Ficolin-3, FAM3D, FITC, 
Fibroblast Activation Protein alpha/FAP, FKBP38, Fas/TNFRSF6, Flap, Fas Ligand/TNFSF6, FLIP, 
FATP1, FLRG, FATP4, FLRT1, FATP5, FLRT2, Fc gamma RI/CD64, FLRT3, Fc gamma RIIB/CD32b, 
Flt-3, Fc gamma RIIC/CD32c, Flt-3 Ligand, Fc gamma RIIA/CD32a, Follistatin, Fc gamma RIII/CD16, 
Foilistatin-like 1, FcRHl/IRTA5 5 FosB/G0S3, FcRH2/IRTA4, FoxD3, FcRH4/IRTAl, FoxJl, 
FcRH5/IRTA2, FoxP3, Fc Receptor-like 3/CD16-2, Fpg, FEN- 1 , FPR1, Fetuin A, FPRL1, Fetuin B, 
FPRL2, FGF acidic, CX3 CL 1 /Fractalkine, FGF basic, Frizzled- 1, FGF-3, Frizzled-2, FGF-4, Frizzled-3, 
FGF-5, Frizzled-4, FGF-6, Frizzled-5, FGF-8, Frizzled-6, FGF-9, Frizzled-7, FGF-10, Frizzled-8, FGF-ll, 
Frizzled-9, FGF-12, Frk, FGF-13, sFRP-1, FGF-16, sFRP-2, FGF-17, sFRP-3, FGF-19, sFRP-4, FGF-20, 
Furin, FGF-21, FXR/NR1H4, FGF-22, Fyn, FGF-23, G9a/EHMT2, GFR alpha-3/GDNF R alpha-3, 
GABA-A-R alpha 1, GFR alpha-4/GDNF R alpha-4, GABA-A-R alpha 2, GITR/TNFRSF 1 8, GABA-A-R 
alpha 4, GITR Ligand7TNFSF18, GABA-A-R alpha 5, GLI-1, GABA-A-R alpha 6, GLI-2, GABA-A-R 
beta 1, GLP/EHMT1, GABA-A-R beta 2, GLP-1 R, GABA-A-R beta 3, Glucagon,GABA-A-R gamma 2, 
Glucosamine (N-acetyl)-6-Sulfatase/GNS, GABA-B-R2, GluRl, GAD 1 /GAD 67, GluR2/3, GAD2/GAD65, 
GluR2, GADD45 alpha, GluR3, GADD45 beta, Glutl, GADD45 gamma, Glut2, Galectin-1, Glut3, 
Galectin-2, Glut4, Galectin-3, Gluts, Galectin-3 BP, Glutaredoxin 1, Galectin-4, Glycine R, Galectin-7, 
Glycophorin A, Galectin-8, Glypican 2, Galectin-9, Glypican 3, GalNAc4S-6ST, Glypican 5, GAP-43, 
Glypican 6, GAPDH, GM-CSF, Gasl, GM-CSF R alpha, Gas6, GMF-beta, GASP-1/WFIKKNRP, gp!30, 
GASP-2/WFIKKN, Glycogen Phosphorylase BB/GPBB, GATA-1, GPR15, GATA-2, GPR39, GATA-3, 
GPVT, GATA-4, GR/NR3C1, GATA-5, Gr-1/Ly-6G, GATA-6, Granulysin, GBL, Granzyme A, 
GCNF/NR6 A 1 , Granzyme B, CXCL6/GCP-2, Granzyme D, G-CSF, Granzyme G, G-CSF R, Granzyme H, 
GDF-1, GRASP, GDF-3 GRB2, GDF-5, Gremlin, GDF-6, GRO, GDF-7, CXCL1/GRO alpha, GDF-8, 
CXCL2/GRO beta, GDF-9, CXCL3/GRO gamma, GDF-1 1, Growth Hormone, GDF-1 5, Growth Hormone 
R, GD1>JF, GRP75/HSPA9B, GFAP, GSK-3 alpha/beta, GFI-1, GSK-3 alpha, GFR alpha-l/GDNF R 
alpha- 1, GSK-3 beta, GFR alpha-2/GDNF R alpha-2 , EZFIT, H2AX, Histidine, H60, HM74A, HAI-1, 
HMGA2, HAI-2, HMGB1, HAI-2A, TCF-2/HNF-1 beta, HAI-2B, HNF-3 beta/FoxA2, HAND1, HNF-4 
alpha/NR2Al, HAPLN1, HNF-4 gamma/NR2A2, Airway Trypsin-like Protease/HAT, HO- 
1/HMOX1/HSP32, HB-EGF, HO-2/HMOX2, CCL14a/HCC-l, HPRG, CCL 1 4b/HCC-3 , Hrk, 
CCL 1 6/HCC-4, HRP-1, alpha HCG, HS6ST2, Hck, HSD-1, H CR/CRAM- A/B , HSD-2, HDGF, 
HSP10/EPF, Hemoglobin, HSP27, Hepassocin, HSP60 , HES-1, HSP70, HES-4, HSP90, HGF, 
HTRA/Protease Do, HGF Activator, HTRA1/PRSS1 1, HGF R, HTRA2/Omi, HIF-1 alpha, 
H VEM/TNFRSF 1 4, HIF-2 alpha, Hyahironan, HIN-l/Secretoglobulin 3A1, 4-Hydroxynonenai, 
Hip ,CCL 1 /I-3 09/TC A-3 , IL-10, cIAP (pan), IL-10 R alpha, cIAP-l/HIAP-2, IL-10 R beta, cIAP-2/HIAP-l, 
IL-1 1, IBSP/Sialoprotein II, IL-1 1 R alpha, ICAM-17CD54, IL-12, ICAM-2/CD102, IL-12/IL-23 p40, 
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ICAM-3/CD50, IL-12 R beta 1, ICAM-5, IL-12 R beta 2, ICAT, IL-13, ICOS, IL-13 R alpha 1, Iduronate 
2-Sulfatase/IDS, IL-13 R alpha 2, IFN, IL-15, IFN-alpha, IL-15 R alpha, IFN-alpha 1, IL-16, IFN-alpha 2, 
IL-lT^IFN-alpha^IL-iyR.IFN-alphaA^L-lTRC, IFN-alpha B2, IL-17 RD, IFN-alpha C, IL-17B, 
IFN-alpha D, IL-17B R, IFN-alpha F, IL-17C, IFN-alpha G, IL- 17D, IFN-alpha H2, IL-17E, IFN-alpha 
I, IL-17F, IFN-alpha Jl, IL-18/IL-1F4, IFN-alpha K, IL-18 BPa, IFN-alpha WA, IL-18 BPc, IFN- 
alpha/betaRl, IL-18 BPd, IFN-alpha/beta R2, IL-18 R alpha/IL-1 R5, IFN-beta, IL-18 Rbeta/IL-1 R7, 
IFN-gamma, IL-19, IFN-garnma Rl, DL-20, IFN-gamma R2, IL-20 R alpha, IFN-omega, IL-20 R beta, 
IgE, IL-21, IGFBP-1, IL-21 R, IGFBP-2, IL-22, IGFBP-3, IL-22 R, IGFBP-4, IL-22BP, IGFBP-5, IL- 
23, IGFBP-6, IL-23 R, IGFBP-L1, IL-24, IGFBP-rpl/IGFBP-7, IL-26/AK155, IGFBP-rPIO, IL-27, IGF-I, 
IL-28A, IGF-I R, IL-2SB, IGF-II, IL-29/IFN-lambda 1, IGF-II R, IL-31, IgG, IL-31 RA, IgM, IL-32 alpha, 
IGSF2, IL-33, IGSF4A/SynCAM, ILT2/CD85j, IGSF4B, ELT3/CD85k, IGSF8, ILT4/CD85d, IgY, 
ILT5/CD85a, IkB-beta, ILT6/CD85e, IKK alpha, Indian Hedgehog, IKK epsilon, INSRR, IKK gamma, 
Insulin, IL-1 alpha^L-1 Fl, Insulin R/CD220, IL-1 beta^L-lF2, Proinsulin, IL-lmAL-lF3, Insulysin/IDE, 
IL-1F5/FIL1 delta, Integrin alpha 2/CD49b, IL-1F6/FIL1 epsilon, Integrin alpha 3/CD49c, IL-1F7/FIL1 
zeta, Integrin alpha 3 beta l/VLA-3, IL-1F8/FIL1 eta, Integrin alpha 4/CD49d, IL-1F9/IL-1 HI, Integrin 
alpha 5/CD49e, IL-1F10/IL-1HY2, Integrin alpha 5 beta 1, IL-1 RI, Integrin alpha 6/CD49f, IL-1 RII, 
Integrin alpha 7, IL-1 R3/IL-1 R AcP, Integrin alpha 9, IL-l R4/ST2, Integrin alpha E/CD103, IL-1 R6/IL- 
1 R rp2, Integrin alpha L/CD1 la, IL-1 R8, Integrin alpha L beta 2, IL-1 R9, Integrin alpha M/CD1 lb, IL-2, 
Integrin alpha M beta 2, IL-2 R alpha, Integrin alpha V/CD51, IL-2 R beta, Integrin alpha V beta 5, IL-3, 
Integrin alpha V beta 3, IL-3 R alpha, Integrin alpha V beta 6, IL-3 R beta, Integrin alpha X/CD1 1c, IL-4, 
Integrin beta 1/CD29, IL-4 R, Integrin beta 2/CD18, IL-5, Integrin beta 3/CD61, IL-5 R alpha, Integrin beta 
5, IL-6, Integrin beta 6, IL-6 R, Integrin beta 7, IL-7, CXCL1 0/IP-1 O/CRG-2, IL-7 R alpha/CD127, 
IRAKI , CXCR1/IL-8 RA, IRAK4, CXCR2/IL-8 RB, IRS-1, CXCL8/IL-8, Islet-1, IL-9, CXCL1 1/I-TAC, 
IL-9 R, Jagged 1, JAM-4/IGSF5, Jagged 2, JNK, JAM-A, JNK1/JNK2, JAM-B/VE-JAM, JNK1, JAM-C, 
JNK2, Kininogen, Kallikrein 3/PSA, Kminostatin, Kallikrein 4, KIR/CD 158, Kallikrein 5, KIR2DL1, 
Kallikrein 6/Neurosin, KIR2DL3, Kallikrein 7, KIR2DL4/CD158d, Kallikrein 8/Neuropsin, KIR2DS4, 
Kallikrein 9, KIR3DL1, Plasma Kallikrein/KLKBl, KIR3DL2, Kallikrein 10, Kirrel2, Kallikrein 11, 
KLF4, Kallikrein 12, KJLF5, Kallikrein 13, KLF6, Kallikrein 14, Klotho, Kallikrein 15, Klotho beta, KC, 
KOR, Keapl, Kremen-1, Kell, Kremen-2, KGF/FGF-7, LAG-3, LENGO-2, LAIR1, Lipin 2, LAIR2, 
Lipocalin-1 , Laminin alpha 4, Lipocalin-2/NGAL, Laminin gamma 1, 5-Lipoxygenase, Laminin I, LXR 
alpha/NRlH3, Laminin S, LXR beta/NRl H2, Laminin- 1, Livin, Larninin-5, LIX, LAMP, 
LMIR1/CD300A, Langerin, LMIR2/CD300c, LAR, LMIR3/CD 300LF, Latexin, LMIR5/CD300LB, 
Layilin, LMIR6/CD3 00LE, LBP, LMQ2, LDL R, LOX-1/SR-E1, LECT2, LRH-1/NR5A2, LEDGF, 
LRIG1, Lefty, LRIG3, Lefty-1, LRP-1, Lefty-A, LRP-6, Legumain, LSECtin/CLEC4G, Leptin, Lumican, 
LeptinR, CXCL15/Lungkine, Leukotriene B4, XCL 1 /Lymphotactin, Leukotriene B4 Rl, Lymphotoxin, 
LIF, Lymphotoxin beta/TNFSF3, LIF R alpha, Lymphotoxin beta R/TNFRSF3, LIGHT/TNFSF 1 4, Lyn, 
Limitin , Lyp, LIMPII/SR-B2, Lysyl Oxidase Homolog 2, LIN-28, LYVE-1, LINGO- 1, alpha 2- 
Macroglobulin, CXCL9/MIG, MAD2L1 , Mimecan, MAdCAM-1, Mindin, MafB, Mineralocorticoid 
R/NR3C2, MafF, CCL3L1/M3P-1 alpha Isoform LD78 beta, MafG, CCL3/MIP- 1 alpha, MafK, 
CCL4L 1 /LAG- 1 , MAG/Siglec-4a , CCL4/MIP-1 beta, MANF, CCL15/MIP-1 delta, MAP2, 
CCL9/10/MIP-1 gamma, MAPK, MEP-2, Marapsin/Pancreasin, CCL19/MIP-3 beta, MARCKS, 
CCL20/MIP-3 alpha, MARCO, MIP-I, Mashl , MIP-H, Matrilin-2, MIP-ni, Matrilin-3, MIS/AMH, 



WO 2007/103515 



PCT/US2007/005952 



Matrilin-4, MIS RII, Matriptase/ST14, MIXL1, MBL, MKK3/MKK6, MBL-2, MKK3, Melanocortin 
3R/MC3R, MKK14, MCAM/CD146, MKK6, MCK-2, MKK7, Mcl-1, MKP-3, MCP-6, MLH-1, 
CCL2/MCP-1, MLK4 alpha, MCP-l 1, MMP, CCL8/MCP-2, MMP-1, CCL7/MCP-3/MARC, MMP-2, 
CCL13/MCP-4, MMP-3, CCL12/MCP-5, MMP-7, M-CSF, MMP-8, M-CSF R, MMP-9, MCV-rype n, 
MMP-10, MD-1, MMP-1 1, MD-2, MMP-1 2, CCL22/MDC, MMP-1 3, MDL-1/CLEC5A, MMP-1 4, 
MDM2, MMP-15, MEA-1, MMP-1 6/MT3-MMP, MEK1/MEK2, MMP-24/MT5-MMP, MEK1, MMP- 
25/MT6-MMP, MEK2, MMP-26, Melusin, MMR, MEPE, MOG, Meprin alpha, CCL23/MPEF-1, Meprin 
beta, M-Ras/R-Ras3, Mer, Mrell, Mesothelin, MRP1 Meteorin, MSK1/MSK2, Methionine 
Aminopeptidase 1, MSK1, Methionine Aminopeptidase, MSK2, Methionine Aminopeptidase 2, MSP, 
MFG-E8, MSP R/Ron, MFRP, Mug, MgcRacGAP, MULT-1, MGL2, Musashi-1, MGMT, Musashi-2, 
MIA, MuSK, MICA, MutY DNA Glycosylase, MICB, MyD88, MICL/CLEC12A, Myeloperoxidase, beta 
2 Microglobulin, Myocardin, Midkine, Myocilin, MIF, Myoglobin, NAIP NGFI-B gamma/NR4A3, 
Nanog, NgR2/NgRHl, CXCL7/NAP-2, NgR3/NgRH2, Nbsl, Nidogen-l/Entactin, NCAM-1/CD56, 
Nidogen-2, NCAM-L1, Nitric Oxide, Nectin-1, Nitrotyrosine, Nectin-2/CDl 12, NKG2A, Nectin-3, 
NKG2C, Nectin-4 ? NKG2D, Neogenin, NKp30, Neprily sin/CD 10, NKp44, Neprilysin- 
2/MMEL 1 /MMEL2 , NKp46/NCRl, Nestin, NKpSO/KLRFl, NETQ2, NKX2.5, Netrin- 1 , NMD A R, NR1 
Subunit, Netrin-2, NMDA R, NR2A Subunit, Netrin-4, NMDA R, NR2B Subunit, Netrin-Gla, NMDA R, 
NR2C Subunit, Netrin-G2a, N-Me-6,7-diOH-TIQ, Neuregulin-1/NRG1, Nodal, Neuregulin-3/NRG3, 
Noggin, Neuritin, Nogo Receptor, NeuroDl, Nogo-A, Neurofascin, NOMO, Neurogenin-1, Nope, 
Neurogenin-2, Norrin, Neurogenin-3, eNOS, Neurolysin, iNOS, Neurophysin II, nNOS, Neuropilin-1, 
Notch-1, Neuropilin-2, Notch-2, Neuropoietin, Notch-3, Neurorrirrun, Notch-4, Neurturin, NOV/CCN3, 
NFAM1, NRAGE, NF-H, NrCAM, NFkBl, NRL,NFkB2, NT-3, NF-L, NT-4, NF-M, NTB- 
A/SLAMF6, NG2/MCSP, NTH1, NGF R/TNFRSF 1 6, Nucleostemin, beta-NGF, Nurr-1/NR4A2 ) NGFI-B 
alpha/NR4Al, OAS2, Orexin B, OBCAM, OSCAR, OCAM, OSF-2/Periostin, OCIL/CLEC2d, Oncostatin 
M/OSM, OCILRP2/CLEC2i, OSM R beta, Oct-3/4, Osteoactivin/GPNMB, OGG1, Osteoadherin, Olig 1, 
2, 3, Osteocalcin, Oligl, Osteocrin, 01ig2, Osteopontin, 01ig3, Osteoprotegerin/TNFRSFllB, 
Oligodendrocyte Marker Ol, Otx2, Oligodendrocyte Marker 04, OV-6, OMgp, OX40/TNFRSF4, Opticin, 
OX40 Ligand/TNFSF4, Orexin A, OAS2, Orexin B, OBCAM, OSCAR, OCAM, OSF-2/Periostin, 
OCIL/CLEC2d 3 Oncostatin M/OSM, OCILRP2/CLEC2i, OSM R beta, Oct-3/4, Osteoactivin/GPNMB, 
OGG1, Osteoadherin, Olig 1, 2, 3, Osteocalcin, Oiigl, Osteocrin, 01ig2, Osteopontin, Oiig3, 
Osteoprotegerin/TNFRSFl IB, Oligodendrocyte Marker Ol, Otx2, Oligodendrocyte Marker 04, OV-6, 
OMgp, OX40/TNFRSF4, Opticin, OX40 LigandVTNFSF4, Orexin A, RACK1, Ret, Radl, REV-ERB 
alpha/NRlDl, Radl 7, REV-ERB beta/NR!D2, Rad51, Rex-1, Rae-1, RGM-A, Rae-1 alpha, RGM-B, Rae- 
1 beta, RGM-C, Rae-1 delta, Rheb, Rae-1 epsilon, Ribosomal Protein S6, Rae-1 gamma, RIP1, Raf-1, 
ROBOl, RAGE, ROB02, RalA/RalB, ROB03, RalA, ROB04, RalB, ROR/NR1F1-3 (pan), 
RANK/TNFRSF1 1A, ROR alpha/NRlFl, CCL5/RANTES, ROR garnrna/NRlF3, Rapl A/B, RTK-like 
Orphan Receptor 1/ROR1, RAR alpha/NRlBl, RTK-like Orphan Receptor 2/ROR2, RAR beta/NR!B2, 
RP105, RAR gamrna/NRlB3, RPA2, Ras, RSK (pan) , RBP4, RSK1/RSK2, RECK, RSK1, Reg 2/PAP, 
RSK2, Reg I, RSK3, Reg II, RSK4, Reg III, R-Spondin 1, Reg Ilia, R-Spondin 2, Reg IV, R-Spondin 3, 
Relaxin-1, RUNX1/CBFA2, Relaxin-2, RUNX2/CBFA1, Relaxin-3, RUNX3/CBFA3, RELM alpha, RXR 
alpha/NR2Bl, RELM beta, RXR beta/NR2B2, RELT/TNFRSF 1 9L, RXR garnnWNR2B3, Resistin, 
S100A10, SLITRK5, S100A8, SLPI, S100A9, SMAC/Diablo, S100B, Smadl, S100P, Smad2, SALL1 , 
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Smad3, delta-Sarcoglycan, Smad4, Sca-1/Ly6, Smad5, SCD-1, Smad7, SCF, SmadS SCFPyc-kit, SMC1 
SCGF, alpha-Smooth Muscle Actio, SCL/Tall, SMUG1, S6P3/SYCP3, Snail, CXCL 1 2/SDF- 1 , Sodium 
Calcium Exchanger 1, SDNSF/MCFD2, Soggy- 1, alpha-Secretase, Sonic Hedgehog, gamma-Secrctase, 
SorCSl, beta-Secretase, SorCS3, E-Selectin, Sortilin, L-Selectin, SOST, P-Selectin, SOX1, Semaphorin 
3A, SOX2, Semaphorin 3C, SOX3, Semaphorin 3E, SOX7, Semaphorin 3F„ SOX9, Semaphorin 6 A, 
SOX10, Semaphorin 6B, SOX17, Semaphorin 6C, SOX21 Semaphorin 6D,SPARC, Semaphorin 7A, 
SPARC-like 1, Separase, SP-D, Serine/OTn-eonine Phosphatase Substrate I, Spinesin, Serpin Al, F-Spondin, 
Serpin A3, SR-AI/MSR. Serpin A4/Kallistatin, Src, Serpin A5/Protein C Inhibitor, SREC-I/SR-F1, Serpin 
A8/Angiotensinogen, SREC-II, Serpin B5, SSEA-1, Serpin CI/Antitluombin-III, SSEA-3, Serpin 
Dl/Heparin Cofactor n, SSEA-4, Serpin El/PAI-1, ST7/LRP12, Serpin E2 ,Stabilin-l , Serpin Fl, Stabilin- 
2, Serpin F2, Stanniocalcin 1, Serpin Gl/Cl Inhibitor, Stanniocalcin 2, Serpin 12, STAT1, Serum Amyloid 
Al, STAT2, SF-1/NR5A1, STAT3, SGK, STAT4, SHBG, STATSa/b, SHIP, STATS a, SHP/NR0B2, 
STATSb, SHP-1, STAT6, SHP-2, VE-Statm, SIGIRR, Stella/Dppa3, Siglec-2/CD22, STRO-1, Siglec- 
3/CD33, Substance P, Siglec-5, Sulfamidase/SGSH, Siglec-6, Sulfatase Modifying Factor 1/SUMF1, 
Siglec-7, Sulfatase Modifying Factor 2/SUMF2, Siglec-9, SUMOl, Siglec-10, SUM02/3/4, Siglec-11, 
SUM03, Siglec-F, Superoxide Dismutase, SIGNR1/CD209, Superoxide Dismutase-l/Cu-Zn SOD, 
SIGNR4, Superoxide Dismutase-2/Mn-SOD, SIRP beta 1, Superoxide Dismutase-3/EC-SOD, SKI, 
Survivin, SLAM/CD 150, Synapsin I, Sleeping Beauty Transposase, Syndecan-1/CD13S, SliG, Syndecan-2, 
SLITRKl, Syndecan-3, SLITRK2, Syndecan-4, SLITRK4 , TACI/TNFRSF13B, TMEFF 1 /Tomoregulin- 
1, TAG2, TMEFF2, TAPPl, TNF~alpha/TNFSF 1 A, CCL17/TARC, TNF-beta/TNFSF IB, Tau, TNF 
RI/TNFRSF1A, TC21/R-Ras2, TNF RILTNFRSF1B, TCAM-1, TOR, TCCR/WSX-1, TP-1, TC-PTP, 
TP63/TP73L, TDG, TR, CCL25/TECK, TR alpha/NRlAU Tcnascin C, TR beta 1/NR1A2, Tenascin R, 
TR2/NR2C1, TER-1I9, TR4/NR2C2, TERT, TRA-1-85, Testican 1/SPOCK1, TRADD, Testican 
2/SPOCK2,TRAF-l, Testican 3/SPOCK3, TRAF-2, TFPI, TRAF-3, TFPI-2, TRAF-4, TGF-alpha, TRAF- 
6, TGF-beta, TRAIL/TNFSF 1 0, TGF-beta 1, TRAIL R1/TNFRSF10A , LAP (TGF-beta 1), TRAIL 
R2/TNFRSF10B, Latent TGF-beta 1, TRAIL R3/TNFRSF 1 0C, TGF-beta 1.2, TRAIL R4/TNFRSF 1 0D, 
TGF-beta 2, TRANCE/TNFSF 1 1 , TGF-beta 3, TfR (Transferrin R), TGF-beta 5, Apo-Transferrin, Latent 
TGF-beta bpl, Ho lo -Transferrin, Latent TGF-beta bp2, Trappm-2/Elafm, Latent TGF-beta bp4, TREM-1, 
TGF-beta RI/ALK-5, TREM-2, TGF-beta RII, TREM-3, TGF-beta RHb, TREML1/TLT-1 , TGF-beta RIII 
, TRF-1 , Thermolysin, TRF-2, Thioredoxin-1, TRH-degrading Ectoenzyme/TRHDE, Thioredoxin-2, 
TRIM5 , Thioredoxin-80, Tripeptidyi-Peptidase I, Thioredoxin-like 5/TRP14 , TrkA, THOP1, TrkB, 
Thrombomodulin/CD141, TrkC, Thrombopoietin, TROP-2, Thrombopoietin R, Troponin I Peptide 
3,Thrornbospondin-l,Troponin T, Thrombospondin-2, TRO Y/TNFRSF 19, Thrombospondin-4, Trypsin 1, 
Thymopoietin, Trypsin 2/PRSS2, Thymus Chemokine-1 , Trypsin 3/PRSS3, Tie-1, Tryptase-5/Prss32, 
Tie-2, Tryptase alpha/TPSl; TIM- 1 /KIM- 1 /HA VCR, Tryptase beta- l/MCPT-7, TIM-2, Tryptase beta- 
2/TPSB2, TIM-3, Tryptase epsilon/BSSP-4, TIM-4, Tryptase gamma- 1 /TPS Gl, TIM-5, Tryptophan 
Hydroxylase, TIM-6, TSC22, TIMP-1, TSG, TIMP-2, TSG-6, TIMP-3, TSK, TIMP-4, TSLP, 
TL1 A/TNFSF1 5, TSLP R, TLR1, TSP50, TLR2, beta-Ill Tubulin, TLR3, TWEAK/TNFSF12, TLR4, 
TWEAK R/TNFRSF 12, TLR5, Tyk2, TLR6, Phospho-Tyrosine, TLR9, Tyrosine Hydroxylase, 
TLX/NR2E1, Tyrosine Phosphatase Substrate I, Ubiquitin, UNC5H3, Ugi, UNC5H4, UGRP1, UNG, 
ULBP-1, uPA, ULBP-2, uPAR, ULBP-3, URB, UNC5H1, UVDE, UNC5H2 , Vanilloid Rl, VEGF 
R, VASA, VEGFRl/Flt-1, Vasohibin, VEGF R2/KDR/Flk-1, Vasorin, VEGF R3/Flt-4, Vasostatin, 
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Versican, Vav-1,VG5Q, VCAM-1, VHR, VDR/NR1I1, Vimenrin, VEGF, Vitronectin, VEGF-B, 
VLDLR, VEGF-C, vWF^A2, VEGF-D , Synuclein-alpha > Ku70 , WASP, Wnt-7b, WIF-1, Wnt-8a 
WISP-1/CCN4, Wnt-8b, WNK1, Wnt-9a, Wnt-1, Wnt-9b, Wnt-3a, Wnt-lOa, Wnt-4, WnMOb, Wnt- 
5a, Wnt-1 1, Wnt-5b s wnvNS3, Wnt7a, XCR1, XPE/DDB1, XEDAR, XPE/DDB2, Xg, XPF, XIAP, XPG, 
5 XPA, XPV, XPD, XRCC1 , Yes, YY1 , EphA4. 

[00215] Numerous human ion channels are targets of particular interest. Non-limiting examples include 5- 

hydroxytryptarnine 3 receptor B subunit, 5 -hydroxytryp tamine 3 receptor precursor, 5-hydroxytryptamine 
receptor 3 subunit C, AAD14 protein, Acetylcholine receptor protein, alpha subunit precursor, 
Acetylcholine receptor protein, beta subunit precursor, Acetylcholine receptor protein/ delta subunit 
10 precursor, Acetylcholine receptor protein, epsilon subunit precursor, Acetylcholine receptor protein, 

gamma subunit precursor, Acid sensing ion channel 3 splice variant b, Acid sensing ion channel 3 splice 
variant c, Acid sensing ion channel 4, ADP-ribose pyrophosphatase, mitochondrial precursor, Alpha 1 A- 
voltage-dependent calcium channel, Amiloride-sensitive cation channel 1, neuronal, Amiloride-sensitive 
cation channel 2, neuronal Amiloride-sensitive cation channel 4, isoform2, Amiloride-sensitive sodium 
15 channel, Amiloride-sensitive sodium channel alpha-subunit, Amiloride-sensitive sodium channel beta- 

subunit, Amiloride-sensitive sodium channel delta-subunit, Amiloride-sensitive sodium channel garnma- 
subunit, Annexin A7, Apical-like protein, ATP-sensitive inward rectifier potassium channel 1, ATP- 
sensitive inward rectifier potassium channel 10, ATP-sensitive inward rectifier potassium channel 11, ATP- 
sensitive inward rectifier potassium channel 14, ATP-sensitive inward rectifier potassium channel 15, ATP- 
20 sensitive inward rectifier potassium channel 8, Calcium channel alphal2.2 subunit, Calcium channel 

alpha 12.2 subunit, Calcium channel alpha IE subunit, delta 19 delta40 delta46 splice variant, Calcium- 
activated potassium channel alpha subunit 1, Calcium-activated potassium channel beta subunit 1, 
Calcium-activated potassium channel beta subunit 2, Calcium-activated potassium channel beta subunit 3, 
Calcium-dependent chloride channel-1, Cation channel TRPM4B, CDNA FLJ90453 fis, clone 
25 NT2RP3001542, highly similar to Potassium channel tetramerisation domain containing 6, CDNA 

FLJ90663 lis, clone PLACE1 005031, highly similar to Chloride intracellular channel protein 5, CGMP- 
gated cation channel beta subunit, Chloride channel protein, Chloride channel protein 2, Chloride channel 
protein 3, Chloride channel protein 4, Chloride channel protein 5, Chloride channel protein 6, Chloride 
channel protein CIC-Ka, Chloride channel protein CIC-Kb, Chloride channel protein, skeletal muscle, 
30 Chloride intracellular channel 6, Chloride intracellular channel protein 3, Chloride intracellular channel 

protein 4, Chloride intracellular channel protein 5, CHRNA3 protein, Clcn3e protein, CLCNKB protein, 
CNGA4 protein, Cullin-5, Cyclic GMP gated potassium channel, Cyclic-nucleotide-gated cation channel 4, 
Cyclic-nucleotide-gated cation channel alpha 3, Cyclic-nucleotide-gated cation channel beta 3, Cyclic- 
nucleotide-gated olfactory channel, Cystic fibrosis transmembrane conductance regulator, Cytochrome B- 
35 245 heavy chain, Dihydropyridine-sensitive L-type, calcium channel alpha-2/delta subunits precursor, 

FXYD domain-containing ion transport regulator 3 precursor, FXYD domain-containing ion transport 
regulator 5 precursor, FXYD domain-containing ion transport regulator 6 precursor, FXYD domain- 
containing ion transport regulator 7, FXYD domain-containing ion transport regulator 8 precursor, G 
protein-activated inward rectifier potassium channel 1, G protein-activated inward rectifier potassium 
40 channel 2, G protein-activated inward rectifier potassium channel 3, G protein-activated inward rectifier 

potassium channel 4, Gamma-aniinobutyric-acid receptor alpha- 1 subunit precursor, Gamnia-aminobutyric- 
acid receptor alpha-2 subunit precursor, Garnma-aminobutyric-acid receptor alpha-3 subunit precursor, 
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Gairmta-aminobutyric-acid receptor alpha-4 subunit precursor, Gamma-arninobutyric-acid receptor alpha-5 
subunit precursor, Garnma-a:mmobutyric-acid receptor alpha-6 subunit precursor, Gamma-aminobutyric- 
acid receptor beta- 1 subunit precursor, Gamina-arninobutyfic-acid receptor beta-2 subunit precursor, 
Gamma-arrmobutyric-acid receptor beta-3 subunit precursor, Ganmia-ammobutyric-acid receptor delta 
5 subunit precursor, Gamma-ammobutyric-acid receptor epsilon subunit precursor, Gamma-aminobutyric- 

acid receptor gamma- 1 subunit precursor, Ganura-anndnobutyric-acid receptor gamma-3 subunit precursor, 
Gamma-ammobutyric-acid receptor pi subunit precursor, Ganania-aminobutyric-acid receptor rho-1 subunit 
precursor, Garrnm-arninobutyric-acid receptor rho-2 subunit precursor, Garnma-ammobutyric-acid receptor 
theta subunit precursor, GluR6 kainate receptor, Glutamate receptor 1 precursor, Glutamate receptor 2 
10 precursor, Glutamate receptor 3 precursor, Glutamate receptor 4 precursor, Glutamate receptor 7, 

Glutamate receptor B, Glutamate receptor delta- 1 subunit precursor, Glutamate receptor, ionotropic kainate 
1 precursor, Glutamate receptor, ionotropic kainate 2 precursor, Glutamate receptor, ionotropic kainate 3 
precursor, Glutamate receptor, ionotropic kainate 4 precursor, Glutamate receptor, ionotropic kainate 5 
precursor, Glutamate [NMDA] receptor subunit 3A precursor, Glutamate [NMDA] receptor subunit 3B 
15 precursor, Glutamate [NMDA] receptor subunit epsilon 1 precursor, Glutamate [NMDA] receptor subunit 

epsilon 2 precursor, Glutamate [NMDA] receptor subunit epsilon 4 precursor, Glutamate [NMDA] receptor 
subunit zeta 1 precursor, Glycine receptor alpha- 1 chain precursor, Glycine receptor alpha-2 chain 
precursor, Glycine receptor alpha-3 chain precursor, Glycine receptor beta chain precursor, H/ACA 
ribonucleoprotein complex subunit 1, High affinity immunoglobulin epsilon receptor beta-subunit, 
20 Hypothetical protein DKFZp3 1 310334, Hypothetical protein DKFZp761M1724, Hypothetical protein 

FIJI 2242, Hypothetical protein FLJ14389, Hypothetical protein FLJ14798, Hypothetical protein 
FLJ14995, Hypothetical protein FLJ16180, Hypothetical protein FLJ16802, Hypothetical protein 
FLJ32069, Hypothetical protein FLJ3740 1, Hypothetical protein FLJ38750, Hypothetical protein 
FLJ40162, Hypothetical protein FLJ41415, Hypothetical protein FLJ90576, Hypothetical protein 
25 FLJ90590, Hypothetical protein FLJ90622, Hypothetical protein KCTD15, Hypothetical protein 

MGC15619, Inositol 1 ,4,5-trisphosphate receptor type 1, Inositol 1 ,4,5-trisphosphate receptor type 2, 
Inositol 1 ,4,5-trisphosphate receptor type 3, Intermediate conductance calcium-activated potassium channel 
protein 4, Inward rectifier potassium channel 1 3, Inward rectifier potassium channel 16, Inward rectifier 
potassium channel 4, Inward rectifying K(+) channel negative regulator Kir2.2v, Kainate receptor subunit 
30 KA2a, KCNH5 protein, KCTD17 protein, KCTD2 protein, Keratinocytes associated transmembrane 

protein 1, Kv channel-interacting protein 4, Melastatin 1, Membrane protein MLC1, MGC15619 protein, 
Mucolipin-1, Mucolipin-2, Mucolipin-3, Multidrug resistance-associated protein 4, N-methyl-D-aspartate 
receptor 2C subunit precursor, NADPH oxidase homolog 1, Navl .5, Neuronal acetylcholine receptor 
protein, alpha- 1 0 subunit precursor, Neuronal acetylcholine receptor protein, alpha-2 subunit precursor, 
35 Neuronal acetylcholine receptor protein, alpha-3 subunit precursor, Neuronal acetylcholine receptor 

protein, alpha-4 subunit precursor, Neuronal acetylcholine receptor protein, alpha-5 subunit precursor, 
Neuronal acetylcholine receptor protein, alpha-6 subunit precursor, Neuronal acetylcholine receptor 
protein, alpha-7 subunit precursor, Neuronal acetylcholine receptor protein, alpha-9 subunit precursor, 
Neuronal acetylcholine receptor protein, beta-2 subunit precursor, Neuronal acetylcholine receptor protein, 
40 beta-3 subunit precursor, Neuronal acetylcholine receptor protein, beta-4 subunit precursor, Neuronal 

voltage-dependent calcium channel alpha 2D subunit, P2X purinoceptor 1, P2X purinoceptor 2, P2X 
purinoceptor 3, P2X purinoceptor 4, P2X purinoceptor 5, P2X purinoceptor 6, P2X purinoceptor 7, 



-58- 



WO 2007/103515 



PCT/US2007/005952 



Pancreatic potassium channel TALK- lb, Pancreatic potassium channel TALK-lc, Pancreatic potassium 
channel TALK- Id, Phospholemman precursor, Plasmolipiri, Polycystic kidney disease 2 related protein, 
Polycystic kidney disease 2-like 1 protein, Polycystic kidney disease 2-like 2 protein, Polycystic kidney 
disease and receptor for egg jelly related protein precursor, Polycystin-2, Potassium channel regulator, 
Potassium channel subfamily K member 1, Potassium channel subfamily K member 10, Potassium channel 
subfamily K member 12, Potassium channel subfamily K member 13, Potassium channel subfamily K 
member 1 5, Potassium channel subfamily K member 1 6, Potassium channel subfamily K member 1 7, 
Potassium channel subfamily K member 2, Potassium channel subfamily K member 3, Potassium channel 
subfamily K member 4, Potassium channel subfamily K member 5, Potassium channel subfamily K 
member 6, Potassium channel subfamily K member 7, Potassium channel subfamily K member 9, 
Potassium channel tetramerisation domain containing 3, Potassium channel tetramerisation domain 
containing protein 12, Potassium channel tetramerisation domain containing protein 14, Potassium channel 
tetramerisation domain containing protein 2, Potassium channel tetramerisation domain containing protein 
4, Potassium channel tetramerisation domain containing protein 5, Potassium channel tetramerization 
domain containing 10, Potassium channel tetramerization domain containing protein 13, Potassium channel 
tetramerization domam-containing 1, Potassium voltage-gated channel subfamily A member 1, Potassium 
voltage-gated channel subfamily A member 2, Potassium voltage-gated channel subfamily A member 4, 
Potassium voltage-gated channel subfamily A member 5, Potassium voltage-gated channel subfamily A 
member 6, Potassium voltage-gated channel subfamily B member 1, Potassium voltage-gated channel 
subfamily B member 2, Potassium voltage-gated channel subfamily C member 1, Potassium voltage-gated 
channel subfamily C member 3, Potassium voltage-gated channel subfamily C member 4, Potassium 
voltage-gated channel subfamily D member 1 , Potassium voltage-gated channel subfamily D member 2, 
Potassium voltage-gated channel subfamily D member 3, Potassium voltage-gated channel subfamily E 
member 1, Potassium voltage-gated channel subfamily E member 2, Potassium voltage-gated channel 
subfamily E member 3, Potassium voltage-gated channel subfamily E member 4, Potassium voltage-gated 
channel subfamily F member 1, Potassium voltage-gated channel subfamily G member 1, Potassium 
voltage-gated channel subfamily G member 2, Potassium voltage-gated channel subfamily G member 3, 
Potassium voltage-gated channel subfamily G member 4, Potassium voltage-gated channel subfamily H 
member 1, Potassium voltage-gated channel subfamily H member 2, Potassium voltage-gated channel 
subfamily H member 3, Potassium voltage-gated channel subfamily H member 4, Potassium voltage-gated 
channel subfamily H member 5, Potassium voltage-gated channel subfamily H member 6, Potassium 
voltage-gated channel subfamily H member 7, Potassium voltage-gated channel subfamily H member 8, 
Potassium voltage-gated channel subfamily KQT member 1, Potassium voltage-gated channel subfamily 
KQT member 2, Potassium voltage-gated channel subfamily KQT member 3, Potassium voltage-gated 
channel subfamily KQT member 4, Potassium voltage-gated channel subfamily KQT member 5 9 Potassium 
voltage-gated channel subfamily S member 1, Potassium voltage-gated channel subfamily S member 2, 
Potassium voltage-gated channel subfamily S member 3, Potassium voltage-gated channel subfamily V 
member 2, Potassium voltage-gated channel, subfamily H, member 7, isoform 2, Potassium/sodium 
hyperpolarization-activated cyclic nucleotide-gated channel 1, Potassium/sodium hyp erpolarization- 
activated cyclic nucleotide-gated channel 2, Potassium/sodium hyperpolaiization-acovated cyclic 
nucleotide-gated channel 3, Potassium/sodium hyperpolarization-activated cyclic nucleotide-gated channel 
4, Probable mitochondrial import receptor subunit TOM40 homolog, Purinergic receptor P2X5, isoform A, 
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Putative 4 repeat voltage-gated ion channel, Putative chloride channel protein 7, Putative GluR6 kainate 
. receptor, Putative ion channel protein CATSPER2 variant 1, Putative ion channel protein CATSPER2 
variant 2, Putative ion channel protein CATSPER2 variant 3, Putative regulator of potassium channels 
protein variant 1, Putative tyrosine-protein phosphatase TPTE, Ryanodine receptor 1, Ryanodine receptor 
2, Ryanodine receptor 3, SH3KBP1 binding protein 1, Short transient receptor potential channel 1, Short 
transient receptor potential channel 4, Short transient receptor potential channel 5, Short transient receptor 
potential channel 6, Short transient receptor potential channel 7, Small conductance calcium-activated 
potassium channel protein 1, Small conductance calcium-activated potassium channel protein 2, isoform b, 
Small conductance calcium-activated potassium channel protein 3, isoform b, Small-conductance calcium- 
activated potassium channel SK2, Small-conductance calcium-activated potassium channel SK3, Sodium 
channel, Sodium channel beta-1 subunit precursor, Sodium channel protein type II alpha subunit, Sodium 
channel protein type III alpha subunit, Sodium channel protein type IV alpha subunit, Sodium channel 
protein type IX alpha subunit, Sodium channel protein type V alpha subunit, Sodium channel protein type 
VII alpha subunit, Sodium channel protein type VIII alpha subunit, Sodium channel protein type X alpha 
subunit, Sodium channel protein type XI alpha subunit, Sodium-and chloride-activated ATP-scnsitive 
potassium channel, Sodiuxn/potassium-traiisporting ATPase gamma chain, Sperm-associated cation channel 
1, Sperm-associated cation channel 2, isoform 4, Syntaxin-lBl, Transient receptor potential cation channel 
subfamily A member 1, Transient receptor potential cation channel subfamily M member 2, Transient 
receptor potential cation channel subfamily M member 3, Transient receptor potential cation channel 
subfamily M member 6, Transient receptor potential cation channel subfamily M member 7, Transient 
receptor potential cation channel subfamily V member 1, Transient receptor potential cation channel 
subfamily V member 2, Transient receptor potential cation channel subfamily V member 3, Transient 
receptor potential cation channel subfamily V member 4, Transient receptor potential cation channel 
subfamily V member 5, Transient receptor potential cation channel subfamily V member 6, Transient 
receptor potential channel 4 epsilon splice variant, Transient receptor potential channel 4 zeta splice 
variant, Transient receptor potential channel 7 gamma splice variant, Tumor necrosis factor, alpha-induced 
protein 1, endothelial, Two-pore calcium channel protein 2, VDAC4 protein, Voltage gated potassium 
channel Kv3.2b, Voltage gated sodium channel betalB subunit, Voltage-dependent anion channel, Voltage- 
dependent anion channel 2, Voltage-dependent anion-selective channel protein 1, Voltage-dependent 
anion-selective channel protein 2, Voltage-dependent anion-selective channel protein 3, Voltage-dependent 
calcium channel gamma- 1 subunit, Voltage-dependent calcium channel gamma-2 subunit, Voltage- 
dependent calcium channel garnrna-3 subunit, Voltage-dependent calcium channel gamma-4 subunit, 
Voltage-dependent calcium channel gamma-5 subunit, Voltage-dependent calcium channel gamma-6 
subunit, Voltage-dependent calcium channel gamma-7 subunit, Voltage-dependent calcium channel 
gamma-8 subunit, Voltage-dependent L-type calcium channel alpha- 1C subunit, Voltage-dependent L-type 
calcium channel alpha-ID subunit, Voltage-dependent L-type calcium channel alpha-lS subunit, Voltage- 
dependent L-type calcium channel beta-1 subunit, Voltage-dependent L-type calcium channel beta-2 
subunit, Voltage-dependent L-type calcium channel beta-3 subunit, Voltage-dependent L-type calcium 
channel beta-4 subunit, Voltage-dependent N-type calcium channel alpha- IB subunit, Voltage-dependent 
P/Q-type calcium channel alpha- 1 A subunit, Voltage-dependent R-type calcium channel alpha- IE subunit, 
Voltage-dependent T-type calcium channel alpha- 1G subunit, Voltage-dependent T-type calcium channel 
alpha-lH subunit, Voltage-dependent T-type calcium channel alpha-11 subunit, Voltage-gated L-type 
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calcium channel alpha- 1 subunit, Voltage-gated potassium channel beta-1 subunit, Voltage-gated 
potassium channel beta-2 subunit, Voltage-gated potassium channel beta-3 subunit, Voltage-gated 
potassium channel KCNA7, 
[00216] Exemplary GPCRs include but are not limited to Class A Rhodopsin like receptors such as Muse. 
5 acetylcholine Vertebrate type 1 , Muse, acetylcholine Vertebrate type 2, Muse, acetylcholine Vertebrate 

type 3, Muse, acetylcholine Vertebrate type 4; Adrenoceptors (Alpha Adrenoceptors type 1, Alpha 
Adrenoceptors type 2, Beta Adrenoceptors type 1, Beta Adrenoceptors type 2, Beta Adrenoceptors type 3, 
Dopamine Vertebrate type 1, Dopamine Vertebrate type 2, Dopamine Vertebrate type 3, Dopamine 
Vertebrate type 4, Histamine type 1, Histamine type 2, Histamine type 3, Histamine type 4, Serotonin type 
10 Serotonin type 2, Serotonin type 3, Serotonin type 4, Serotonin type 5, Serotonin type 6, Serotonin type 

7, Serotonin type 8, other Serotonin types, Trace amine, Angiotensin type 1, Angiotensin type 2, 
Bombesin, Bradykinin, C5a anaphylatoxin, Frnet-leu-phe, APJ like, Interleukin-8 type A, Interleukin-8 
type B, Interleukin-8 type others, C-C Chemokine type 1 through type 1 1 and other types, C-X-C 
Chemokine (types 2 through 6 and others), C-X3-C Chemokine, Cholecystokinin CCK, CCK type A, CCK 
15 t VP e B > CCK others, Endothelin, Melanocortm (Melanocyte stimulating hormone, Adrenocorticotropic 

hormone, Melanocortin hormone), Duffy antigen, Prolactin-releasing peptide (GPR10), Neuropeptide Y 
(type 1 through 7), Neuropeptide Y, Neuropeptide Y other, Neurotensin, Opioid (type D, K, M, X), 
Somatostatin (type 1 through 5), Tachykinin (Substance P (NK1), Substance K (NK2), Neuromedin K 
(NK3), Tachykinin like 1, Tachykinin like 2, Vasopressin / vasotocin (type 1 through 2), Vasotocin, 
20 Oxytocin / mesotocin, Conopressin, Galanin like, Proteinase-activated like, Orexin & neuropeptides 

FF,QRFP, Chemokine receptor-like, Neuromedin U like (Neuromedin U, PRXamide), hormone protein 
(Follicle stimulating hormone, Lutropin-choriogonadotropic hormone, Thyrotropin, Gonadotropin type I, 
Gonadotropin type II), (Rhod)opsin, Rhodopsin Vertebrate (types 1-5), Rhodopsin Vertebrate type 5, 
Rhodopsin Arthropod, Rhodopsin Arthropod type 1, Rhodopsin Arthropod type 2, Rhodopsin Arthropod 
25 type 3 > Rhodopsin Mollusc, Rhodopsin, Olfactory (Olfactory II fam 1 through 13), Prostaglandin 

(prostaglandin E2 subtype EP1, Prostaglandin E2/D2 subtype EP2, prostaglandin E2 subtype EP3, 
Prostaglandin E2 subtype EP4, Prostaglandin F2-alpha, Prostacyclin, Thromboxane, Adenosine type 1 
through 3, Purinoceptors, Purinoceptor P2RY1-4,6,1 1 GPR91, Purinoceptor P2RY5,8,9,10 GPR35,92,174 ) 
Purinoceptor P2RY12-14 GPR87 (UDP-Glucose), Cannabinoid, Platelet activating factor , Gonadotropin- 
30 releasing hormone, Gonadotropin-releasing hormone type I, Gonadotrop in-releasing hormone type II, 

♦ 

Adipokinetic hormone like, Corazonin, Thyrotropin-releasing hormone & Secretogogue, Thyrotropin- 
releasing hormone, Growth hormone secretagogue, Growth hormone secretagogue like, Ecdysis-triggering 
hormone (ETHR), Melatonin, Lysosphingolipid & LPA (EDG), Sphingosine 1 -phosphate Edg-1, 
Lysophosphatidic acid Edg-2, Sphingosine 1-phosphate Edg-3, Lysophosphatidic acid Edg-4, Sphingosine 

35 1-phosphate Edg-5, Sphingosine 1-phosphate Edg-6, Lysophosphatidic acid Edg-7, Sphingosine 1- 

phosphate Edg-8, Edg Other Leukotriene B4 receptor, Leukotriene B4 receptor BLT1, Leukotriene B4 
receptor BLT2, Class A Orphan/other, Putative neurotransmitters, SREB, Mas proto-oncogene & Mas- 
related (MRGs), GPR45 like, Cysteinyl leukotriene, G-protein coupled bile acid receptor, Free fatty acid 
receptor (GP40,GP41,GP43), Class B Secretin like, Calcitonin, Corticotropin releasing factor, Gastric 

40 inhibitory peptide, Glucagon, Growth hormone-releasing hormone, Parathyroid hormone, PACAP, 

Secretin, Vasoactive intestinal polypeptide, Latrophilin, Latrophilin type 1, Latrophilin type 2, Latrophilin 
type 3, ETL receptors, Brain-specific angiogenesis inhibitor (BAT), Methuselah-like proteins (MTH), 
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Cadherin EGF LAG (CELSR), Very large G-protein coupled receptor, Class C Metabotropic glutamate / 
pheromone, Metabotropic glutamate group I through III, Calcium-sensing like, Extracellular calcium- 
sensing, Pheromone, calcium-sensing like other, Putative pheromone receptors, GABA-B, GABA-B 
subtype 1, GABA-B subtype 2, GABA-B like, Orphan GPRC5, Orphan GPCR6, Bride of sevenless 
proteins (BOSS), Taste receptors (T1R), Class D Fungal pheromone, Fungal pheromone A-Factor like 
(STE2,STE3), Fungal pheromone B like (BAR,BBR,RCB,PRA), Class E cAMP receptors, Ocular albinism 
proteins, FrizzledVSmoothened family, frizzled Group A (Fz 1&2&4&5&7-9), frizzled Group B (Fz 3 <& 6), 
frizzled Group C (other), Vomeronasal receptors, Nematode chemoreceptors, Insect odorant receptors, and 
Class Z Archaeal/bacterial/fungal opsins. 
[00217] The subject MURPs can be designed to target any cellular proteins including but not limited to cell surface 
protein, secreted protein, cytosolic protein, and nuclear protein. A target of particular interest is an ion 
channel. 

[00218] Ion channels constitute a superfamily of proteins, including the family of potassium channels (K-channels), 
the family of sodium channels (Na- channels), the family of calcium channels (Ca-channels), the family of 
Chlorine channels (Cl-channels) and the family of acetylcholine channels. Each of these families contains 
subfamilies and each subfamily typically contains specific channels derived from single genes. For 
example, the K-channel family contains subfamilies of voltage-gated K-channels called Kvl .x and Kv3.x. 
The subfamily Kv 1.x contains the channels Kvl.l, Kvl. 2 and Kvl. 3, which correspond to the products of 
single genes and are thus called 'species'. The classification applies to the Na-, Ca-, CI- and other families 
of channels as well. 

[00219] Ion channels can also be classified according to the mechanisms by which the channels are operated. 

Specifically, the main types of ion channel proteins are characterized by the method employed to open or 
close the channel protein to either permit or prevent specific ions from permeating the channel protein and 
crossing a lipid bilayer cellular membrane. One important type of channel protein is the voltage-gated 
channel protein, which is opened or closed (gated) in response to changes in electrical potential across the 
ceil membrane. The voltage-gated sodium channel 1 .6 (Navl .6) is of particular interest as a therapeutic 
target. Another type of ion channel protein is the mechanically gated channel, for which a mechanical stress 
on the protein opens or closes the channel. Still another type is called a ligand-gated channel, which opens 
or closes depending on whether a particular ligand is bound to the protein. The ligand can be either an 
extracellular moiety, such as a neurotransmitter, or an intracellular moiety, such as an ion or nucleotide. 

[00220] Ion channels generally permit passive flow of ions down an electrochemical gradient , whereas ion pumps 
use ATP to transport against a gradient. Coupled transporters, both antiporters and symporters, allow 
movement of one ion species against its gradient, powered by the downhill movement of another ion 
species. 

[00221] One of the most common types of channel proteins, found in the membrane of almost all animal cells, 
permits the specific permeation of potassium ions across a cell membrane. In particular, potassium ions 
permeate rapidly across cell membranes through K + channel proteins (up to 10" 8 ions per second). 
Moreover, potassium channel proteins have the ability to distinguish among potassium ions, and other 
small alkali metal ions, such as Li* or Na + with great fidelity. In particular, potassium ions are at least ten 
thousand times more permanent than sodium ions. Potassium channel proteins typically comprise four 
(usually identical) subunits, so their cell surface targets are present as tetramers, allowing tetravalent 
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binding of MURPs. One type of subunit contains six long hydrophobic segments (which can be 

* * 

membrane-spanning), while the other types contains two hydrophobic segments. 
[00222] Another significant family of channels is calcium channel. Calcium channels are generally classified 
according to their electrophysiological properties as Low-voltage-activated (LVA) or High-voltage- 
5 activated (HV A) channels. HVA channels comprises at least three groups of channels, known as L-, N- 

and P/Q-type channels. These channels have been distinguished one from another electrophysiologic ally as 
well as bio-chemically on the basis of their pharmacology and ligand binding properties. For instance, 
dihydropyridines, diphenyl-alkylamines and piperidines bind to the a, subunit of the L-type calcium 
channel and block a proportion of HVA calcium currents in neuronal tissue, which are termed L-type 

10 calcium currents. N-type calcium channels are sensitive to omega conopeptides, but are relatively 

insensitive to dihydropyridine compounds, such as nimodipine and nifedipine. P/Q-type channels, on the 
other hand, are insensitive to dihydropyridines, but are sensitive to the funnel web spider toxin Aga IIIA. 
R-type calcium channels, like L-, N-, P- and Q-type channels, are activated by large membrane 
depolarizations, and are thus classified as high voltage-activated (HVA) channels. R-type channels are 

15 generally insensitive to dmydropyridines and omega conopeptides, but, like P/Q, L and N channels, are 

sensitive to the funnel web spider toxin AgalVA. Immunocytochemical staining studies indicate that these 
channels are located throughout the brain, particularly in deep midline structures (caudate-putamen, 
thalamus, hypothalamus, amygdala, cerebellum) and in the nuclei of the ventral midbrain and brainstem. 
Neuronal voltage-sensitive calcium channels typically consists of a central or,, subunit, an a 2 /d subunit, a p 

20 subunit and a 95 kD subunit. 

[00223] Additional non-limiting examples include Kir (an inwardly rectified potassium channel), Kv (a voltage- 
gated potassium channel), Nav (a voltage-gated sodium channel), Cav (a voltage-gated calcium channel), 
CNG (cyclic nucleotide-gated channel), HCN ( hyperpolarization-activated channel), TRP (a transient 
receptor potential channel), C1C (a chloride channel), CFTR (a cystic fibrosis transmembrane conductance 

25 regulator, a chloride channel), IP3R (a inositol trisphosphate receptor), RYR (a ryanodine receptor). Other 

channel types are 2-pore channels, glutamate-receptors (AMPA, NMD A, KA), M2, Connexins and Cys- 
loop receptors. 

[00224] A common layout for ion channel proteins, such as Kvl.2 9 Kv3.1, Shaker, TRPC1 and TRPC5 is to have 
* six membrane-spanning segments, arranged as follows: 
30 N~tenrmius---Sl--El---S2- 

[00225] Wherein SI -6 are membrane-spanning sequences, El-3 are extracellular surface loops and Xl-2 are 
intracellular surface loops. The E3 loop is generally the longest of the three extracellular loops and is 
hydrophilic so it is a good target for drugs and MURPs to bind. The pore-forming part of most channels is 
a multimeric (e.g. tetrameric or rarely pentameric) complex of membrane-spanning alpha-helices. There is 
35 generally a pore loop, which is a region of the protein that loops back into the membrane to form the 

selectivity filter that determines which ion species can permeate. Such channels are called 'pore-loop* 
channels. 

[00226] The ion channels are valuable targets for drug design because they are involved in a broad range of 

physiological processes. In human, there exist approximately over three hundreds of ion channel proteins, 
40 ma °y of which have been implicated in genetic diseases. For example, abbrebrant expression or function 

of ion channels has been shown to cause a wide arrange of diseases including cardiac, neuronal, muscular, 
respiratory metabolic diseases. This section focuses on ion channels, but the same concepts and approaches 
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are equally applicable to all membrane proteins, including 7TMs, lTMs, G-proteins and G-Protein Coupled 
receptors (GPCRs), etc. Some of the ion channels are GPGRs. 
[00227] Ion channels typically form large macromolecular complexes that include tightly bound accessory protein 
subunits and combinatorial use of such subunits contributes to the diversity of ion channels. These 
accessory proteins can also be the binding targets of the subject MURPs, microproteins and toxins. 
[00228] The subject MURPs can be designed to bind any of the channels known in the art and to those specifically 
exemplified herein. MURPs exhibiting a desired ion channel binding capability (encompassing specificity 
and avidity) can be selected by any recombinant and biochemical (e.g. expression and display) techniques 
known in the art. For instance, MURPs can be displayed by a genetic package including but not limited to 
phages and spores, and be subjected to panning against intact cell membranes, or preferably intact cells 
such as whole mammalian cells. To remove the phage that bind to the other, non-target cell surface 
molecules, the standard approach was to perform subtraction panning against similar cell lines that had a 
low or non-detectable level of the target receptor. However, Popkov et al. (J. Immunol. Methods 29 1 :137- 
151 (2004)) showed that related cell types are not ideal for subtraction because they generally have a 
reduced but still significant level of the target on their surface, which reduces the number of desired phage 
clones. This problem occurs even when panning on cells that have been transfected with the gene encoding 
the target, followed by negative selection/subtraction on the same cell-line which was not transfected, 
especially when the native target gene was not knocked out. Instead, Popkov et al. showed that the 
negative selection or subtraction panning works much better if performed with an excess of the same cells 
that are used for normal panning (positive selection), except that the target has now been blocked with a 
tagh-affinity, target-specific inhibitor, such as a small molecule, peptide or an antibody to the target, which 
makes the active site unavailable. This process is called "negative selection with epitope-masked cells", 
which is particularly useful in selecting the subject MURPs with a desired ion-channel binding capability. 
[00229] In a separate embodiment, the present invention provides microproteins, and particularly microproteins 
exhibiting binding capability towards at least one family of ion channels. The present invention also 
provides a genetic package displaying such microproteins. Non-limiting ion-channel examples to which 
the subject microproteins bind are sodium, potassium, calcium, acetylcholine, and chlorine channels. Of 
particular interest are those microproteins and the genetic packages displaying such microproteins, which 
exhibit binding capability towards native targets. Native targets are generally natural molecules or 
fragments, derivatives thereof that the microprotein is known to bind, typically including those known 
binding targets that have been reported in the literature. 
[00230J The subject invention also provides a genetic package displaying an ion-channel-binding microprotein 
which has been modified. The modified microprotein may (a) binds to a different family of channel as 
compared to the corresponding unmodified microprotein; (b) binds to a different subfamily of the same 
channel family as compared to the corresponding unmodified microprotein; (c) binds to a different species 
of the same subfamily of channel as compared to the corresponding unmodified microprotein; (d) the 
microprotein binds to a different site on the same channel as compared to the corresponding unmodified 
microprotein; and/or (e) binds to the same site of the same channel but yield a different biological effect as 
compared to the corresponding unmodified microprotein. 
[002311 Figure 22 and 46 show how microprotein domains or toxins that each bind at different sites of the same ion 
channel can be combined into a single protein. The two binding sites that these two microproteins bind to 
can be on two channels from different families, two channels from the same family but a different 
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subfamily, two channels from the same subfamily but a different species (gene product), or two different 
binding sites on the same channel (species) or they can (simultaneously or not) bind the same binding site 
on the same channel (species) since the channels are multimeric. The binding modules and domains that 
bind to sites on the channels can be microprotein domains (natural or non-natural, 2- to 8-disulfide 
5 containing), one-disulfide peptides, or linear peptides. These modules can be selected independently and 

combined, or one can be selected from a library to bind in the presence of one fixed, active binding module. 
In the latter case, the display library would display multiple modules of which one would contain a library 
of variants. A typical goal is to select a dimer from this library that has a higher affinity than the active 
monomer that was the starting point. 

10 [00232] In another embodiment, the present invention provides a protein comprising a plurality of ion-channel 

binding domains, wherein individual domains are microprotein domains that have been modified such that 

(a) the microprotein domains bind to a different family of channel as compared to the 
corresponding unmodified microprotein domains; (b) the microprotein domains bind to a different 
subfamily of the same channel family as compared to the corresponding unmodified microprotein domains; 

15 ( c ) the microprotein domains bind to a different species of the same subfamily as compared to the 

corresponding unmodified microprotein domains; (d) the microprotein domains bind to a different site on 
the same channel as compared to the corresponding unmodified microprotein domains; (e) the 
microprotein domains bind to the same site of the same channel but yield a different biological effect as 
compared to the corresponding unmodified microprotein domains; and/or (f) the microprotein domains 

20 bind to the same site of the same channel and yield the same biological effect as compared to the 

corresponding unmodified microprotein domains. Where desired, the microprotein domains may comprise 
natural or non-natural sequences. The individual domains can be linked together via a heterologous linker. 
The individual microprotein domains can bind to the same or different channel family, same or different 
channel subfamily, same.or different species of the same subfamily, same or different site on the same 

25 channel. 

[00233] The subject microproteins can be a toxin. Preferably, the toxin retains in part or in whole its toxicity 

spectrum In particular, venomous animals, such as snakes, encounter a range of prey and intruder species 
and the venom toxins differ in activity for the different receptors of the different species. The venom 
consists of a large number of related and unrelated toxins, with each toxin having a "spectrum of activity", 

30 which can be defined as all of the receptors from all of the species on which that toxin has measurable 

activity. All of the targets in the 'spectrum of activity' are considered "native targets" and this includes any 
human targets that the toxin is active against. The native target(s) of a microprotein or toxin include all of 
the targets that the toxin is reported to inhibit in the literature. The higher the affinity or activity on a target, 
the more likely that target is the natural, native target, but it is not uncommon for toxins to act on multiple 

35 targets within the same species. Native target(s) can be human or non-human receptors that the toxin is 

active against. 

[00234| For the toxin to retain the ability to bind to cells after fusion to the display vector, it may be desirable to test 
both the N-terminus and C-terminus for fusion and to test a variety of fusion sites (i.e., 0,1,2,3,4,5,6 amino 
acids before the first cysteine or after the last cysteine of the toxin domain, if the toxin domain is a cystein- 
40 containing domain) using a synthetic DNA library approach, preferably encoding a library of glycine-rich 

linkers, which form the smallest amino acid chain, are uncharged and are most likely to be compatible with 
binding of the toxin to the target. Since the N-terminal amino group and the C-terminal carboxyl groups 
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may be involved in target binding, the library should contain a lysine or a arginine to mimic the positively 

• ■ 

charged amino group (or fusions to the N-terrninus of the toxin) and a glutamate or an aspartate to mimic 
the negatively charged carboxyl group (for fusions to the C-terrninus of the toxin). 
[00235] The inhibitors) that are used to block the target during negative selection can be small molecules, peptides 
5 or proteins, and natural or non-natural. In addition to simple subtraction, the choice of the mixture of 

inhibitors is a valuable tool to control the specificity of the ion channel inhibitors that are being designed. 
Because there are over three hundreds ion channels in total, with partially overlapping specificities and 
sequence similarities, and multiple modulatory sites per channel, each having a different effect, the 
specificity requirement can be complex. 
10 I00236J When modifying the activity of a toxin, or when combining two different toxins into a single protein, the 

two toxins can bind the same channel at the same site and have the same physiologic effect, or the two 
toxins can bind the same channel at the same site and have a different physiologic effect, or the two toxins 
can bind to the same channel at a different site, or the two toxins can bind to different channels that belong 
to the same subfamily (i.e. Kvl.3 and Kvl.2; meaning product of a different gene or 'species'), or the two 
15 toxins can bind to different channels that belong to the same family (i.e. both are K-channels), or the two 

toxins can bind to channels that belong to different families (i.e. K-channels versus Na-channels). 
[00237] Ion channels typically have many transmembrane segments (24 for sodium channels) and thus offer a 

number of different, non-competing and non-overlapping binding sites for modulators to alter the activity 
of the channel in different ways. One approach is to create binders for one site on the same ion channel 
20 from existing binders for a different site, even if these sites are unrelated. To achieve this, the existing toxin 

can be used as a targeting agent for a library of l-,2-,3-,or 4-disulfide proteins that is separated from the 
targeting toxin by a flexible linker of 5,6,7,8,9,10,12,14,16,18,20,25,30,35,40 or 50 amino acids. It is useful 
if the affinity of the targeting agent is not too high, so that the affinity of the new library can have a 
significant contribution to the overall affinity. Another approach is to create new modulators for channels 
25 &om existing modulators for other channels that are related in sequence or in structure. The conotoxin 

family, for example, contains sequence-related and structure-related modulators for Ca-, K, Na-channels 
and nicotinic acetylcholine receptors. It appears feasible to convert a K-channel modulator into a Na- 
channel modulator using a library of conotoxin-derivatives, or vice versa. For example, Kappa-conotoxins 
inhibit K-channels, Mu-conotoxins and Delta-conotoxins inhibit Na-channels, Omega-conotoxins inhibit 
30 Ca-channels and Alpha-conotoxins inhibit acetylcholine receptors. 

[00238] The proximity of different binding sites, each with a different effect on channel activity, from the same ion 
channel makes it attractive to link the inhibitors using flexible linkers, creating a single inhibitor with two 
domains, each binding at a different site. Or a single protein with two domains that bind at different copies 
of the same site, yielding a bivalent, high affinity interaction (avidity). This approach has not been taken by 
35 natural toxins, presumably because they must act fast and thus stay small in order to have maximal tissue 

penetration, but for pharmaceuticals the speed of action is less important, making this is an attractive 
approach. 

100239J One can thus create combinatorial libraries of dimeric, trimeric, tetrameric or multimeric 

toxins/modulators, each native or modified, and directly screen these libraries at the protein level or pan 
40 ^e Cranes using genetic packages for improved affinity (avidity, if binding occurs simultaneously at 

multiple sites) and then characterize the specificity and activity of such multimeric clones by protein 
expression and purification followed by cell-based activity assays, including patch-clamp assays. The 
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individual modules can be panned and selected separately, in isolation of each other, or they can be 
designed in each other's presence, such that the new domain is added to a display system as a library that 
also contain a fixed, active copy that serves as a targeting element for the library and only clones that are 
significantly better than the fixed, active monomer are selected and characterized. 

[00240] Figs. 46 and 47 show some of the monomeric derivatives that can be made from native (natural) toxins, and 
some of the multimers that can be made to bind at multiple different binding sites of the target. The linkers 
are shown as glycine-rich rPEG, but the linkers could be any sequence and could also be optimized using 
molecular libraries followed by parining. One can create libraries inside the active, native toxin itself, using 
a variety of mutagenesis strategies as describes above, or one can expand the existing area of contact with 
the target by creating libraries on the N-terrninal or C-terminal side of the active toxin, hoping to create 
additional contacts with the target. Such libraries can be based on existing toxins with known activity for 
that site, or they can be or naive 1-, 2-, 3-, 4-disulfide libraries based on unrelated microprotein scaffolds. 
These additional contact elements can be added on one or both sides of the active domains, and can be 
directly adjacent to the existing modulatory domain or they can be separated from it by flexible linkers. The 
initial multimer or the final, improved multimer can be a homomultimer or a heteromultimer, based on 
sequence similarity of the domains or based on target specificity of the domains of the multimer. Thus, the 
monomers that comprise the multimer may bind to the same target sites but have the same or different 
sequences. With 10-100 different native toxins that are known to bind to each family of channels, and with 
2,3,4,5 or 6 domains per clone, display libraries with a huge combinatorial diversity can be created even if 
one only uses native toxin sequences. Low level synthetic mutagenesis based on amino acid similarity or on 
phylogenetic substitution rates within the family can be used to create high quality libraries of mutants, of 
which a very high fraction is expected to retain function, with a high probability of enhanced function in 
some of the properties of interest. 

f00241] The binding capability of the subject MURPs, microproteins, or toxins to a given ion channel can be 
measured in terms of Hill Coefficient. Hill Coefficient indicates the stoichiometry of the binding 
interaction. A Hill coefficient of 2 indicates that 2 inhibitors bind to each channel. One can also assess the 
allosteric modulation, which is modulation of activity at one site caused by binding at a distant site. 

(00242] The biological activity or effect of an ion channel and the ability of the subject MURPs, microproteins or 
toxins to regulate an ion channel activity can be assessed using a variety of in vitro and in vivo assays. For 
instance, methods are available in the art for measuring voltage, measuring current, measuring membrane 
potential, measuring ion flux, e.g., potassium or rubidium, measuring ion concentration, measuring gating, 
measuring second messengers and transcription levels, and using e.g., voltage-sensitive dyes, radioactive 
tracers, and patch-clamp electrophysiology. In particular such assays can be used to test for microproteins 
and toxins that can inhibit or activate an ion channel of interest 

100243] Specifically, potential channel inhibitors or activators can be tested in comparison to a suitable control to 
examine the extent of modulation. Control samples can also be samples untreated with the candidate 
activators or inhibitors. Inhibition is present when a given ion channel activity value relative to the control 
is about 90%, 80%, 70%, 60%, 50%, 40%, 30%, 20%, 10%, or even less. IC50 is a commonly used unit 
(the concentration of inhibitor that reduces the ion charmers activity by 50%) for determining the 
inhibitory effect. Similar for IC90. Activation of channels is achieved when the select a given ion channel 
activity value relative to the control is increased by 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 
100%, 200%, 500%, or more. 
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[00244] Changes in ion flux may be assessed by determining changes in polarization (i.e., electrical potential) of the 
cell or membrane expressing the channel of interest For instance, one method is to determine changes in 
cellular polarization is by measuring changes in current (thereby measuring changes in polarization) with 
voltage-clamp and patch-clamp techniques, e.g., the "cell-attached" mode, the "inside-out" mode, and the 
"whole cell" mode (see, e.g., Ackerman et al., New Engl. J. Med. 336:1575-1595 (1997)). Whole cell 
currents are conveniently determined using the standard methodology (see, e.g., Hamil et al., Pflugers. 
Archiv. 391:85 (1981). Other known assays include: radiolabeled rubidium flux assays and fluorescence 
assays using voltage-sensitive dyes (see, e.g., Vestergarrd-Bogind et al., J. Membrane Biol. 88:67-75 
(1988); Daniel et al. 9 J. Pharmacol. Meth. 25:185-193 (1991); Holevinsky et al., J. Membrane Biology 
137:59-70(1994)). 

[00245J The effects of the candidate MURPs, microproteins, or toxins upon the function of a channel of interest can 
be measured by changes in the electrical currents or ionic flux or by the consequences of changes in 
currents and flux. The downstream effect of the candidate proteins on ion flux can be varied. Accordingly, 
any suitable physiological change can be used to assess the influence of a candidate protein on the test 
channels. The effects of candidate protein can be measured by a toxin binding assay. When the functional 
consequences are determined using intact cells or animals, one can also measure a variety of effects such as 
transmitter release (e.g., dopamine), hormone release (e.g., insulin), transcriptional changes to both known 
and uncharacterized genetic markers (e.g., northern blots), cell volume changes (e.g., in red blood cells), 
immunoresponses (e.g., T cell activation), changes in cell metabolism such as cell growth or pH changes, 
and changes in intracellular second messengers such as Ca2 + . 
[00246] Other key biological activities of ion channels are ion selectivity and gating. Selectivity is the ability of 
some channels to discriminate between ion species, allowing some to pass through the pore while 
excluding others. Gating is the transition between open and closed states. They can be assessed by any of 
the methods known in the art or disclosed herein 
[00247] Yet another biological property that the subject MLHRP, microprotein, or toxin can be selected for is the 
frequency of opening and closing of the target channels, called Gating Frequency. Gating Frequency is 
influenced by voltage (in voltage gated channels, which are opened or closed by changes in membrane 
voltage) and Iigand-binding. The transition rate between open and closed states is typically 
<10microseconds but can be increased or decreased by other molecules. The flux rate (current) through the 
pore when it is open is on the order of 10e7 ions per second for ion channels and much less for coupled 
exchangers. Following opening, some voltage-gated channels enter an inactivated, non-conducting state in 
which they are refractory to depolarization. 



EXAMPLES 

Example: Design of a Glycine-Serine oligomer based on human sequences 
[00248] The human genome data base was searched for sequences that are rich in glycine. Three sequences were 
identified as suitable donor sequences as shown in Table X. 

Table X: Donor sequences for GRS design A. 

Accession Sequences 



NP 009060 



GGGSGGGSGSGGGG 



Amino acid 



486-499 



Protein 



zinc finger protein 



Q9Y2X9 



GSGSGGGGSGG 



19-31 



zinc ringer protein 
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CAG38801 SGGGGSGGGSGSG 7-19 



MAP2K4 



[00249] Based on the sequences in Table X we designed a glycine rich sequence that contains multiple repeats of 
the peptide A with sequence GGGSGSGGGGS. Peptide A can be oligomerized to form structures with the 
formula (GGGSGSGGGGS) n where n is between 2 and 40. Figure 5 shows that all possible 9mer 
5 subsequences in oligomers of peptide A are contained in at least one of the proteins listed in table 3. Thus 

oligomers of peptide A do not contain human T cell epitopes. Inspection of figure 5 reveals that GRS 
based on oligomers of peptide A can begin and end at any of the positions of peptide A. 
Example: Design of Glvcine-proline o ligomer based on human sequences 
[00250] Glycine rich sequences were designed based on sequence 
10 GPGGGGGPGGGGGPGGGGVGGGGGGGVGGGGGGPGGG f which represents amino acids 146-182 of 

the human class 4 POU domain with accession number NPJ)0622 8. Figure 6 illustrates that oligomers of 
peptide B with sequence GGGGGPGGGGP can be utilized as GRS. All 9mer subsequences that are 
contained in peptides with the sequence (GGGGGPGGGGP) n are also contained in the sequence of the 
POU domain. Thus, such oligomeric sequences do not contain T cell epitopes. 
15 Example: Design of glycine-glutamic acid oligomer 

[00251] Glycine rich sequences can be designed based on the subsequence GAGGEGGGGEGGGPGG that is part 
of the ribosomal protein S6 kinase (accession number BAD92170). For instance, oligomers of peptide C 
with the sequence GGGGE will form sequences where most 9mer subsequences will be contained in the 
sequence of ribosomal protein S6 kinase. Thus, oligomeric GRS of the general structure (GGGGE) n bear a 
20 very low risk of containing T cell epitopes. 

Example: Ide ntification of human hydrophilic glvcine-rich sequences 
[00252] A data base of human proteins was searched for subsequences that are rich in glycine residues. These 
subsequences contained at least 50% glycine. Only the following non-glycine residues were allowed to 
occur in the GRS: ADEHKPRST. 70 subsequences were identified that had a minimum length of 20 amino 
25 acids. These subsequences are listed in appendix A. They can be utilized to construct GRS with low 

immunogenic potential in humans. 
Example: Construction of rPEG J288 
[00253] The following example describes the construction of a codon optimized gene encoding a URP sequence 
with 288 amino acids and the sequence (GSGGEG) 48 . First we constructed a stuffer vector pCWOOSl as 
50 illustrated in Fig. 40. The sequence of the expression cassette in pCW005 1 is shown in Fig. 42. The 

stuffer vector was based on a pET vector and includes a T7 promoter. The vector encodes a Flag sequence 
followed by a stuffer sequence that is flanked by Bsal, Bbsl, and Kpnl sites. The Bsal and Bbsl sites were 
inserted such that they generate compatible overhangs after digestion as illustrated in Fig. 42. The stuffer 
sequence was followed by a His 6 tag and the gene of green fluorescent protein (GFP). The stuffer sequence 
55 contains stop codons and thus E. coli cells carrying the stuffer plasmid pCW005 1 formed non-fluorescent 

colonies. The stuffer vector pCW005 1 was digested with Bsal and Kpnl. A codon library encoding URP 
sequences of 36 amino acid length was constructed as shown in Fig. 41 . The URP sequence was 
designated rPEGJf36 and had the amino acid sequence (GSGGEG) 6 . The insert was obtained by annealing 
synthetic oligonucleotide pairs encoding the amino acid sequence GSGGEGGSGGEG as well as a pair of 
^° oligonucleotides that encode an adaptor to the Kpnl site. The following oligonucleotides were used: 

pr_LCW0057for: AGGTAGTGGWGGWGARGGWGGWTCYGGWGGAGAAGG, pr_LCW0057rev: 
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[00254] ACCTCCTTCTCCWCCRGAWCCWCCYTCWCCWCCACT, pr_3KpnIstopperFor: 

AGGTTCGTCTTCACTCGAGGGTAC, pr^KpnlstopperRev; CCTCGAGTGAAGACGA. The annealed 
oligonucleotide pairs were ligated, which resulted in a mixture of products with varying length that 
represents the varying number of rPEG_J12 repeats. The product corresponding to the length of rPEG_J36 
5 was isolated from the mixture by agarose gel electrophoresis and ligated into the Bsal/Kpnl digested stuffer 

vector pCW0051. Most of the clones in the resulting library designated LCW0057 showed green 
fluorescence after induction which shows that the sequence of rPEGJ36 had been ligated in frame with the 
GFP gene. The process of screening and iterative multimerization of rPEG_J36 sequences is illustrated in 
Fig. 14. We screened 288 isolates from library LCW0057 for high level of fluorescence. 48 isolates with 
10 stron S fluorescence were analyzed by PCR to verify the length of the rPEG_J segment and 16 clones were 

identified that had the expected length of rPEG_J36. This process resulted in a collection of 16 isolates of 
rPEG_J36 > which show high expression and which differ in their codon usage. The isolates were pooled 
and dimerized using a process outlined in Fig. 40. A plasmid mixture was digested with Bsal/Ncol and a 
fragment comprising the rP£G_J36 sequence and a part of GFP was isolated. The same plasmid mixture 
15 was also digested with Bbsl/Ncol and the vector fragment comprising rPEGJF36, most of the plasmid 

vector, and the remainder of the GFP gene was isolated. Both fragments were mixed, ligated, and 
transformed into BL21 and isolates were screened for fluorescence. This process of dimerization was 
repeated two more rounds as outlined in Fig. 14. During each round, we doubled the length of the rPEG_J 
gene and ultimately obtained a collection of genes that encode rPEG_J288. The amino acid and nucleotide 
20 sequence of rPEG_J288 is shown in Fig. 15. It can be seen that the rPEGJT288 module contains segments 

of rPEG J36 that differ in their nucleotide sequence despite of having identical amino acid sequence. Thus 
we minimized internal homology in the gene and as a result we reduced the risk of spontaneous 
recombination. We cultured E. coli BL21 harboring plasmids encoding rPEG_J28S for at least 20 
doublings and no spontaneous recombination was observed. 
25 Example: Construction of rPEG H288 

[002551 A library of genes encoding a 288 amino acid URP termed rPEG_H288 was constructed using the same 
procedure that was used to construct rPEG_J288. rPEG__H288 has the amino acid sequence 
(GSGGEGGSGGSG) 24 . The flow chart of the construction process in shown in Fig. 14. The complete 
amino acid sequence as well as the nucleotide sequence of one isolate of rPEG_H288 as given in Fig. 1 6. 
30 Example: Serum stability of rPEG J288 

[00256] A fusion protein containing the an N-terminal Flag tag and the URP sequence rPEG_J288 fused to the N- 

terminus of green fluorescent protein was incubated in 50% mouse serum at 37 C for 3 days. Samples were 
withdrawn at various time points and analyzed by SDS PAGE followed by detection using Western 
analysis. An antibody against the N-terminal flag tag was used for Western detection. Results are shown 

35 111 Fi S- 28 > which indicate that a URP sequence of 288 amino acids can be completely stable in serum for at 

least three days. 

Example: Absence of pre-ex isting antibodies to rPEG J288 in serum 
[00257J Existence of antibodies against URP would be an indication of a potential immunogenic response to this 
glycine rich sequence. To test for the presence of existing antibodies in serum, an URP-GFP fusion was 
*° subjected to an EL1SA by immobilizing URP-GFP on a support and subsequently incubating with 30% 

serum. The presence of antibodies bound to URP-GFP were detected using an anti-IgG-horse radish 
peroxidase antibody and substrate. The data are shown in Fig. 29. The data show, that the fusion protein 
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can be detected by antibodies against GFP or Flag but not by murine serum. This indicates that murine 

serum does not contain antibodies that contain the URP sequence. 

Example: Purification of a fusion protein containing rPEG J2S8 
[002581 We purified a protein with the architecture Flag-rPEG_J288-H6-GFP. The protein was expressed in E. coli 
5 BL21 in SB medium. Cultures were induced with 0.5 mM IPTG overnight at 18C. Cells were harvested 

by centrifiigation. The pellet was re-suspended in TBS buffer containing benzonase and a commercial 

protease inhibitor cocktail The suspension was heated for 10 mm at 75C in a water bath to lyze the cells. 

Insoluble material was removed by centrifugation. The supernatant was purified using immobilized metal 

ion specificity (IMAC) followed by a column with immobilized anti-Flag antibody. Fig. 43 shows PAGE 
10 analysis of the purification process. The process yielded protein with at least 90% purity. 

Example: Construction of fusion p rot ein between rPEG J288 and interferon-alpha 
[00259] A gene encoding human interferon alpha was designed using codon optirnization for E. coli expression. 

The synthetic gene was fused with a gene encoding rPEG_J288. A His6 tag was placed at the N-terminus 

to facilitate detection and purification of the fusion protein. The amino acid sequence of the fusion protein 
15 is given in Fig. 44. 

Example: Construction of rPEG J288-G-CSF fusion 
[00260] A gene encoding human G-CSF was designed using codon optimization for E. coli expression. The 

synthetic gene was fused with a gene encoding rPEG_J288. A His6 tag was placed at the N-terminus to 

facilitate detection and purification of the fusion protein. The amino acid sequence of the fusion protein is 
20 given in Fig. 44. 

Example: Construction of rPEG J288-hGH fusion 
[00261] A gene encoding human growth hormone was designed using codon optimization for E. coli expression. 

The synthetic gene was fused with a gene encoding rPEG J288. A His6 tag was placed at the N-terminus 
to facilitate detection and purification of the fusion protein. The amino acid sequence of the fusion protein 
25 is given in Fig. 44. 

Example: Expression of f usion proteins between rPEG J288 and hum an proteins 
[00262] The fusion proteins between rPEG__J288 and two human proteins, interferon-alpha and human growth 

hormone were cloned into a T7 expression vector and transformed into E. coli BL21. The cells were grown 
at 37 C to an optical density of 0.5 OD. Subsequently, the cells were cultured at 1 8 C for 30 min. Then 0.5 

30 IPTG was added and the cultures were incubated in a shaking incubator at 1 8C overnight Cells were 

harvested by centrifugation and soluble protein was released using BugBuster (Novagen). Both, insoluble 
and soluble protein fractions were separated by SDS-PAGE and the fusion proteins were detected by 
Western using and antibody against the N-terminal His6 tag for detection. Fig. 45 shows the Western 
analysis of the two fusion proteins as well as rPEG_J288-GFP as control. All fusion proteins were 

35 expressed and the majority of the protein was in the soluble fraction. This is evidence of the high solubility 

of rPEG_J288 because most attempts at expression of the interferon-alpha and human growth hormone in 
the cytosol of E. coli, that have been reported in the literature, resulted in the formation of insoluble 
inclusion bodies. Fig. 45 shows that the majority of fusion proteins are expressed as full length proteins, 
i.e. no fragments that would suggest incomplete synthesis or partial protein degradation were detected. 

40 Example: Construction and binding of aV EGF mummer 

[00263] Libraries of cysteine-constrained peptides were constructed as published [Scholle, M. D., et al. (2005) 

Comb Chem High Throughput Screen, 8: 545-51]. These libraries were panned against human VEGF and 
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two binding modules were indentified consisting of amino acid sequences FTCTNHWCPS or 
FQCTRHWCPI. Oligonucleotides encoding the amino acid sequence FTCTNHWCPS were ligated to a 
nucleotide sequence encoding the URP sequence rPEG_A36 with the sequence (GGS) I2 . Subsequently, the 
fusion sequence was dimerized using restriction enzymes and ligation steps to construct a molecule that 
contains 4 copies of the VEGF binding module separated by rPEG_A36 fused to GFP. The VEGF binding 
affinity of fusion proteins containing between zero and four VEGF-binding units were compared in Fig. 30. 
A fusion protein containing only rPEG_A36 fused to GFP shows no affinity for VEGF. Adding increasing 
numbers of VEGF binding modules increases affinity of the resulting fusion proteins. 
Example: Discovery of 1SS binding modules against therapeutic targets 
[00264] Random peptide libraries were generated according to Scholle, et al. [Scholle, M. D., et al. (2005) Comb 
Chem High Throughput Screen, 8: 545-51] The naive peptide libraries displayed cysteine-constrained 
peptides with cysteines spaced by 4 to 10 random residues. The library design is illustrated in the table: 

Table X: Naive 1SS libraries: 



LNG0001 


XXXCXXCXXX 


X3 CX2CX3 


NNS NNS NNS TGC NNS NNS TGT NNS NNS NNS 


LNG0002 


XXCXXXCXXX 


X2CX3CX3 


NNS NNS TGC NNS NNS NNS TGT NNS NNS NNS 


LNG0003 


XXCXXXXCXX 


X2CX4CX2 


NNS NNS TGC NNS NNS NNS NNS TGT NNS NNS . 


LNG0004 


XCXXXXXCXX 


X 1 CX5CX2 


NNS TGC NNS NNS NNS NNS NNS TGT NNS NNS 


LNG0005 


XCXXXXXXCX 


X 1 CXgCX 1 


NNS TGC NNS NNS NNS NNS NNS NNS TGT NNS 


LNG0006 


CXXXXXXXCX 


CX7CXI 


TGC NNS NNS NNS NNS NNS NNS NNS TGT NNS 


t \Tnnnm 
JLINljrUUU/ 


LaaaaaaaaC 


CX S C 


TGC NNS NNS NNS NNS NNS NNS NNS NNS TGT 


LNG0008 


CXXXXXXXXXC 


CX 9 C 


TGC NNS NNS NNS NNS NNS NNS NNS NNS NNS TG 


LNG0009 


CXXXXXXXXXXC 


CXjqC 


TGC NNS NNS NNS NNS NNS NNS NNS NNS NNS NN 
TGT 


LNG0010 


XXXXXXCXXCXXXXXX 


XgCXiCXfi 


NNS NNS NNS NNS NNS NNS TGC NNS NNS TGT NN 
NNS NNS NNS NNS NNS 


LNG0011 


XXXXXCXXXCXXXXXX 


X5CX3CX6 


NNS NNS NNS NNS NNS TGC NNS NNS NNS TGT NN; 
NNS NNS NNS NNS NNS 


LNG0012 


XXXXXCXXXXCXXXXX 


X5CX4CX5 


NNS NNS NNS NNS NNS TGC NNS NNS NNS NNS TG' 
NNS NNS NNS NNS NNS 


LNG0013 


XXXXCXXXXXCXXXXX 


X4CX5CX5 


NNS NNS NNS NNS TGC NNS NNS NNS NNS NNS TG' 
NNS NNS NNS NNS NNS 


LNG0014 


XXXXCXXXXXXCXXXX 


X4.CX$CX4 


NNS NNS NNS NNS TGC NNS NNS NNS NNS NNS NN; 
TGT NNS NNS NNS NNS 


LNG0015 


XXXCXXXXXXXCXXXX 


X3CX7CX4 


NNS NNS NNS TGC NNS NNS NNS NNS NNS NN S NN: 
TGT NNS NNS NNS NNS 


LNG0016 


XXXCXXXXXXXXCXXX 


X3 cxgcx3 


NNS NNS NNS TGC NNS NNS NNS NNS NNS NNS NN! 
NNS TGT NNS NNS NNS 


LNG0017 


XXCXXXXXXXXXCXXX 


X9GX9CX3 


NNS NNS TGC NNS NNS NNS NNS NNS NNS NNS NNI 
NNS TGT NNS NNS NNS 


LNG0018 


XXCXXXXXXXXXXCXX 


X2CX ] 0GX2 


NNS NNS TGC NNS NNS NNS NNS NNS NNS NNS NN* 



NNS NNS TGT NNS NNS 
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[00265] The libraries were panned agains a series of therapeutically relevant targets using the following protocol: 
Wells on immunosorbent ELISA plates were coated with 5/Kg/ml of the target antigen in PBS overnight at 
4°C. Coated plates were washed with PBS, and non-specific sites were blocked with Blocking Buffer (PBS 
containing either 0.5% BSA or 0.5% Ovalbumin) for 2h at room temperature. The plates were then washed 
with PBST (PBS containing 0.05% Tween 20), and phage particles at l-5xj0 l2 /ml in Binding Buffer 
(Blocking Buffer containing 0.05% Tween 20) were added to the wells and incubated with shaking for 2h 
at room temperature. Wells were then emptied and washed with PBST. Bound phage particles were eluted 
from the wells by incubation with lOOmM HC1 for lOmin at room temperature, transferred to sterile tubes, 
and neutralized with 1M TRIS base. For infection, log phase E. Coli SS320 growing in Super Broth 
supplemented with 5^g/ml Tetracycline were added to the neutralized phage eluate, and the culture was 
incubated with shaking for 30min at 37°C. Infected cultures were then transferred to larger tubes 
containing Super Broth with 5^g/ml Tetracycline and the cultures were incubated with shaking overnight at 
37°C. The overnight cultures were cleared of E. Coli by centrifugation, and phage were precipitated from 
the supernatant following the addition of a solution of 20% PEG and 2.5MNaCl to a final PEG 
concentration of 4%. Precipitated phage were harvested by centrifugation, and the phage pellet was 
resuspended in 1ml PBS, cleared of residual E. Coli by centrifugation, and tranfered to a fresh tube. Phage 
concentrations were estimated spectrophotometrically and phage was utilized for the next round of 
selection. Individual clones were screened for target binding affinity after 3 or 4 rounds of phage panning. 
Individual plaques from phage clones selected during the panning were picked into Super Broth containing 
5/xg/ml Tetracycline and grown overnight with shaking at 37°C. ELISA plates were prepared by coating 
antigen and control proteins (BSA, Ovalbumin, IgG) at 3jtig/ml in PBS overnight at 4°C. The plates were 
washed with PBS, and blocked with Blocking Buffer (PBS containing 0.5% BSA) for 2h at room 
temperature. Overnight cultures were cleared of E. coli by centrifugation and the supernatant was diluted 
1 : 10 in Binding Buffer (Blocking Buffer containing 0.05% Tween 20) and transferred to the ELISA plates 
after washing with PBST (PBS containing 0.05% Tween 20). The plates were incubated with shaking for 
2h at room temperature. Following washing with PBST, anti-M13-HRP (Pharmacia), 1 :5000 dilution in 
PBS, was added to wells. The plates were incubated with shaking for 30 min at room temperature and 
washed with PBST, followed by PBS. A substrate solution containing 0.4mg/ml ABTS and 0.001% H 2 0 2 
in 50mM phosphate-citrate buffer was added to the wells, and allowed to develop for 40rnin after which the 
plates were read in a plate reader at 405nrrL These ELISA readings allowed the determination of clone 
specificity, and antigen-specific clones were sequenced commercially via established methods. 

Table X: Sequences of EpCAM-specific binding modules 
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Table X: Sequences of VEGF-specific binding modules 
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5 Table X: Sequences of TrkA-specific binding modules 

KT WDCRNSGHCVI TFK sNGO035S3.074 
AT WDCRDHNF S CVRLS SNG0035S3.089 

Example: aEnCAM drug conjugates 
[00266] Anti-EpCAM peptides were isolated from random peptide libraries that were generated according to 

Scholle, et al. [Scholle, M. D., et al. (2005) Comb Chem High Throughput Screen, 8: 545-51] The naive 

10 peptide libraries displayed cysteme-constrained peptides with cysteines spaced by 4 to 10 random residues. 

After three rounds of affinity selection with the above libraries, several EpCAM specific peptide ligands 
(EpCaml) were isolated (Table X). The EpCaml isolates have a conserved cysteine spacing of four amino 
acids (CXXXXC). EpCaml peptide ligands were then softly randomized (except cysteine positions) with 
codons encoding 3-9 residues and moved into a phagernid vector. Phagemid libraries were subsequently 

15 affinity selected against EpCAM to isolate peptide ligands optimized for binding (Table X, EpCam2). 

EpCam2 ligands contain the conserved CXXXXC cysteine spacing. In addition, the majority of anti- 
EpCam sequences do not contain a lysine residue, which allows for conjugation to free amine groups 
outside of the binding sequences. Furthermore, anti-EpCam peptide ligands can be genetically fused to 
URP sequences (of any length) and multimerized using iterative dimerization. The resulting anti -EpCAM 

20 MURPs can be used to specifically target EpCAM with increased affinity over monomer sequences. An 
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example of a tetramer EpCAM-URP amino acid sequence is shown in Fig. 3 1. This sequence contains 
only two lysine residues that are located in the N-terminai Flag-tag. The side chains of these lysine 
residues are particularly suitable for drug conjugation. 

Table X, Anti-EpCam sequences 

Name Sequence 
EpCam 1 LRCWGMLCYA 

LRCIGQICWR 

LKCLYNICWV 

LFCWGNVCHF 

LTCWGQVCFR 

RPGMACSGQLCWLNSP 

PHALQC YG SLC WPSHL 

RAGITCHGHLCWPITD 

RP ALKCI GTLCSL ANP 

PHGLWCHGSLCHYPLA 

PHGLICAGSICFWPPP 

PRNLTCYGQICFQSQH 

PHNLACQNSICVRLPR 

PHGLTCTNQICFYGNT 

EpCam 2 HSLTCYGQICWVSNI 

PTLTCYNQVCWVNRT 
PALRCLGQLCWVTPT 

PGLRCLGTLCWVPNR 
RNLTCWNTVCYAYPN 
RGLKCLGQLCWVSSN 
PTLKCSGQICWVPPP 

RNLECLGNVCSLLNQ 

PTLTCLNNLCWVPPQ 
RGLKCSGHLCWVTPQ 
HGLTCHNTVCWVHHP 
HTLECLGNICWVINQ 

HGLTCYNQICWAPRP 

HGLACYNQLCWVNPH 
RGLACQGNICWRJLNP 
RAITCLGTLCWPTSP 
LTLECIGNICYVPHH 

5 

Example: Random sequence addition 
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[00267] Binding modules can be affinity matured, or lengthened, by the addition of URIMike linkers and random 
sequence to the N-terrninus, C-terminus, or both N- and C-terminus of the binding sequence. Fig. 32 
shows the addition of naive cysteine-constrained sequences to an anti-EpCAM binding module. Libraries 
of random sequence additions can be generated using a single-stranded or double-stranded DNA cloning 
approaches. Once generated, libraries can be affinity selected against the initial target protein or a second 
protein. For example, an addition library that contains an anti-EpCAM binding module can be used to 
select sequences that contain 2 or more binding sites to the target protein. 
Example: Construction of a 2SS buildup library 

[00268] A series of oligonucleotides was designed to construct a library based on the VEGF-binding 1SS peptide 
FTCTNHWCPS. The oligonucleotides incorporate variations in cysteine distance patterns of the flanking 
sequences while the VEGF-binding peptide sequence was kept fixed. 

Forward oligos: 

[00269] LMS70-1 CAGGCAGCGGGCCCGTCTGGCCCGTGYTTTA CITGTACGAATCATTGGTGTC(^ 
[00270] LMS70-2 C AGGC AGCGGGCCCGTCTGGCCCGTG YNNKTTTA CTTGTACG A ATC ATTGGTGTCCT 
[00271] LMS70-3 

CAGGCAGCGGGCCCGTCTGGCCCGTGYNNICISNKTTTA CTTGTACGAATCATTGGTGTC 
[00272] LMS70-4 

CAGGCAGCGGGCCCGTCTGGCCCGTGYNHTI^^ 
[00273] LMS70-5 

CAGGCAGCGGGCCCGTCTGGCCCGTGYNHTNHT 
TCCT 

[00274] LMS70-6 

CAGGCAGCGGGCCCGTCTGGCCCGTGYKMTKMTKMTKMTKM 
TGGTGTCC 

Reverse oligos (reverse complemented^: 

[00275] LMS70-1R ACCGGAACCACCAGACTGGCCRCACGAAGGACACCAATGATTCGTACAA 
[00276] LMS70-2R ACCGGAACCACCAGACTGGCCRCAM1VNCGAAGGACACCAATGATTCGTACAA 
[00277] LMS70-3R 

ACCGGAACCACCAGACTGGCCRCAMNNMINTNCGAAGGACACCAATGATTCGTACAA 
[00278] LMS70-4R 

ACCGGAACCACCAGACTGGCCRCAADNADNADNCGAAGGACACCAATGATTCGTACAA 
[00279] LMS70-5R 

ACCGGAACCACCAGACTGGCCRCAADNADNADNADNCGAAGGACACCAATGATTCGTACAA 
[00280] LMS70-6R 

ACCGGAACCACCAGACTGGCCRCAAKMAKMAKMAKMAKM 
TACAA 

Oligo dilutions 

[00281] Mixture 1 (from 100 fiM stocks): 100^1 70-6, 33/zl 70-5, 11^1 70-4, 3.66jul 70-3, 1.2^1 70-2, 0.4/il 70-1. 

Mixture 2 (from 100 ftM stocks): lOOjd 70-6R, 33^1 70-5R, llpl 70-4R, 3.66/d 70-3R, 1.2^1 70-2R, 

OApA 70-1R 
PCR assembly 
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J00282] 10.0 Ml Template Oligo (5 fxM)> 10.0 fll 10X Buffer, 2.0 dNTPs (lOmM), 1.0 pi cDNA Polymerase 

(Clonetech), 77 fll DS H 2 0. PCR program: 95°C 1 rnin, (95*C 15 sec, 54°C 30 sec, 68°C 15 sec) x5, 68°C 

PCR amplification 

[00283] Primers, 10.0 fll Assembled mixture, 10.0 pi 10X buffer, 2.0 dNTPs (lOmM), 10.0 fll LIBPTF (5 fiM) 9 10.0 
fll LIBPTR (5 fiM), 1.0 fll cDNA polymerase (Clonetech), 57 fil DS H 2 0. PCR program: 95°C 1 min, 
(95°C 15 sec, 54°C 30 sec, 68°C 15 sec) x25, 68°C 1 min. The product was purified by Arnicon colum 
Y10. The assembled product was digested with Sfil and BstXI and ligated into the phagemid vector 
pMP003. Ligation was performed overnight at 16°C in a MJ PCR machine. Ligation then was purified by 
EtOH precipitation. Transformation into fresh competent ER2738 cells by Electroporation. 

[00284] The resulting library was panned against VEGF as described below. Several isolates were identified that 
showed improved binding to VEGF relative to the 1SS starting sequence. Binding and expression data are 
shown in Fig. 38. Sequences and results of Western analysis of buildup clones is shown in Fig. 39. 
Example: Phage panning of Buildup libraries 

[00285] First round panning: 

[00286] 1) First round, coat 4 wells per library to be screened. Coat the well of a Costar 96-well ELISA plate with 
0.25 ug of VEGF l21 antigen in 25 ul of PBS. Cover the plate with a plate sealer. Coating can be performed 
overnight at 4°C or for 1 h at 37°C. 

[00287] 2) After shaking out the coating solution, block the well by adding 150 ul of PBS/BSA 1%. Seal and 
incubate for 1 h at 37°C. 

[00288] 3) After shaking out the blocking solution, add 50 ul of freshly prepared phage (see library reamplification 
protocol) to the well. For the first round only, also add 5 ul of Tween 5%: Seal the plate and incubate for 2 
h at 37°C. 

[00289] In the meantime, inoculate 2 ml SB medium plus 2 ul of 5 mg/ml Tetracycline with 2 ul of an ER 2738 
cell preparation and allow growth at 250 rpm and 37°C for 2.5 h. Grow 1 culture for each library that is 
screened including negative selections. Take all precautions to avoid a contamination of the culture with 
phage. 

[00290] 4) Shake out the phage solution, add 150 ul of PBS/Tween 0.5 % to the well and pipette 5 times vigorously 
up and down. Wait 5 min, shake out, and repeat this washing step. In the first round, wash in this fashion 5 
times, in the second round 10 times, and in the third, fourth and fifth round 15 times. 

[00291] 5) After shaking out the final washing solution, add 50 ul of freshly prepared 10 mg/ml trypsin in PBS, 

seal, and incubate for 30 min at 37°C. Pipette 10 times vigorously up and down and transfer the eluate (4 x 
50 ul in the first round, 2 x 50 ml in the second round, 1 x 50 ul in the subsequent rounds) to the prepared 
2-mI E. coli culture and incubate at room temperature for 15 min. 

[00292] 6) Add 6 ml of pre- warmed SB medium, 1.6 ul of carbenicillin and 6 ul of 5 mg/ml Tetracycline. Transfer 
the culture into a 50-ml polypropylene tube. 

[00293] 7) Shake the 8-xnl culture at 250 rpm and 37°C for 1 h, add 2.4 ul 100 mg/ml carbenicillin, and shake for an 
additional hour at 250 rpm and 37°C. 

[00294] 8) Add 1 ml of VCSM13 helper phage and transfer to a 500-ml polypropylene centrifuge bottle. Add 91 ml 
ofpre-warmed (37°C) SB medium and 46 ul of 100 mg/ml carbenicillin and 92 ul of 5 mg/ml Tetracycline. 
Shake the 100-ml culture at 300 rpm and 37°C for 1 1/2 to 2 h. 

[00295] 9) Add 140 ul of 50 mg/ml kanamycin and continue shaking at 300 rpm and 37°C overnight. 
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[00296] 10) Spin at 4000 rpm for 1 5 min at 4°C. Transfer the supernatant to a clean 500-ml centrifuge bottle and 
add add 25 ml of 20% PEG-8000/NaCl 2.5M. Store on ice for 30 min. 

[00297] 1 1) Spin at 9000 rpm for 1 5 min at 4°C. Discard the supernatant, drain inverted on a paper towel for at least 
10 min, and wipe off remaining liquid from the upper part of the centrifuge bottle with a paper towel. 
5 [00298] 12) Resuspend the phage pellet in 2 ml of PBS/BSA 0.5 %/Tween 0.5% buffer by pipetting up and down 

along the side of the centrifuge bottle and transfer to a 2-ml microcentrifuge tube. Resuspend further by 
pipetting up and down using a 1-ml pipette tip, spin at full speed in a microcentrifuge for 1 min at 4°Q and 
pass the supernatant through a 0.2-um filter into a sterile 2-ml microcentrifuge tube. 

[00299] 1 3) Continue from step 3) for the next round or store the phage preparation at 4°C. Sodium azide may be 
10 added to 0.02 % (w/v) for long-term storage. Only freshly prepared phage should be used for each round. 

[00300] Second round parming 

[00301] Second round, coat 2 wells per library to be screened. Coat the well of a Costar 96-well ELISA plate with 
0.25 ug of VEGF 121 antigen in 25 ul of PBS. Cover the plate with a plate sealer. Coating can be performed 
overnight at 4°C or for 1 h at 37°C. 

15 [00302] Also block 2 uncoated wells for each library to be used as negative control for the enrichment ratio 

calculation. 
[00303] Third round panning 

[00304] Third round, coat 1 well per library to be screened. Coat the well of a Costar 96-well ELISA plate with 0.25 
fig of VEGF 12I antigen in 25 ul of PBS. Cover the plate with a plate sealer. Coating can be performed 
20 overnight at 4°C or for 1 h at 37°C. 

[00305] Also block 1 uncoated well for each library to be used as negative control for the enrichment ratio 
calculation. 

Example: Solution-based panning: 
[00306] 1 . Biotinylate the target protein according to manufacturer. 
25 [00307] 2. Coat a total of 8 wells (per selection) with 1 .0 fig of neutravidin (Pierce) in PBS and incubate 

overnight at 4°C. 

[00308] 3 . Block the wells with SuperBlock (Pierce) for 1 h at room temp. Store plate with blocking buffer until 
needed (in Step 6). 

[00309] 4. Use 100 nM of biotinylated target protein and add 1012 phage/ml (in PBST) for a total volume of 100- 
30 200 jul using SuperBlock plus Tween 20 0.05%. 

[00310] 5. Tumble phage-target mixture at room temp for at least lh. 

[00311] 6. Dilute 100/d phage-target mix with 700 fil SuperBlock, mix, and add 100/tl to each of 8 neutravidin- 

coated wells (from Step 3). 
[00312] 7. Incubate for 5 min at room temp. 
55 [00313] 8. Wash 8X with PBST. 

[00314] 9. Elute phage with 100/d of 100 mM HC1 for 10 min. 
[00315} 10. Neutralize by adding 10/ri of 1M TRIS pH=3.0. 

[0031 6] 1 1 . Infect cells for plating or amplify phage for a subsequent round of solution panning. 
Example: Screening bv Phage Elisa for VEGF p ositive clnngg 

*0 [00317] 1) Add 0.5 ml SB containing 50 ug/ml carbenicillin to 96 deep well plate. Pick one colony and inoculate 

wells. 

[00318] 2) Shake the plate containing the bacterial cultures at 300 rpm o/n at 37oC. 
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[00319] 3) Prepare 4 ng/ul target protein solution in PBS. Add 25 ul (100 ng) of protein to each well and incubate 
overnight at 4oC. 

[00320] 4) Shake out coated ELISA plates and wash 2x with PBS. Add 150 fil/well PBS+0.5% BSA (blocking 
buffer). Block for Ih at RT. 
5 [00321 ] 5) Spin down rnicrotube racks (3000 rpm; 20 min). 

■ 

[00322] 6) Prepare binding buffer (blocking buffer +0.5% Tween 20). Aliquot 135ul binding buffer per well in low 

protein-binding 96 well plate. 
[00323] 7) Shake out wells on ELISA plates and wash 2 times with PBST (PBS +0.5% Tween 20). 
[00324] 8) Dilute 15 ul phage from o/n cultures 1:10 in PBST, mix by pipetting, and transfer 30ul to each protein- 
*° coated well. Incubate 2h at RT with gentle shaking. 

100325] 9) Wash plates 6 times with PBST. 

[003261 10)Add50ulantiM13-HRP 1:5000 in binding buffer to the wells. Incubate 30 min with gentle shaking at 
RT. 

[00327] 1 1) Wash the plates 4 times with PBST, followed by 2 times with H20. 
15 [00328] 12) Prepare 6 ml of ABTS solution (5.88 ml of citrate buffer plus 120 ul ABTS and 2 ul H202). Aliquot 

50ul per well on each ELISA plate 
[00329] 13) Incubate at RT and read O.D. at 405 nm using an ELISA plate reader at appropriate time points 
depending on the signal (up to lh) 
Example: Dimerization of binding modules 
20 [00330] Phage displayed libraries of 1 0e9 to lOel 1 cyclic peptides with 4,5,6,7,8,9, 10,1 1 and 12 randomized or 

partially randomized amino acids between the disulfide-bonded cystines, and in some cases additional 
randomized amino acids on the outside of the cystine pair, were created by standard methods. Panning of 
these cyclic peptide libraries against a number of targets, including human VEGF, reliably yielded peptides 
that bound specifically to hVEGF and not to BSA, Ovalbumin or IgG. 
25 Example: Construction and panning of a plexin-based library 

[00331] Two libraries were designed based on the Plexin scaffold. The Pfam protein database was used for 

phylogenetic alignment of naturally occurring plexin domains as shown in Fig. 35. The middle part of 
plexin scaffold (Cys24-Gly25-Trp26-Cys27) is conserved in both library designs and served as a crossover 
region for N- and C- library generation. The randomization schemes of both plexin libraries are shown in 
30 Fi S- 36 - The two libraries were generated by overlapping two library-encoding oligos at the crossover 

region and using pull-thru PGR followed by restriction cloning (Sfil/BstXI) and cloning into phagemid 
vector pMP003. The resulting plexin libraries were designated LMP031 (N terminal library) and LMP032 
(C terminal library) and each was represented by a complexity of approximately 5 x 1 0 s independent 
transforrnants. For validation, approximately 24 Carb-resistant clones from each unselected library were 
35 analyzed by PCR. Clones that gave a correct size fragment (375 bp) were further analyzed by DNA 

sequencing. Correct full-length plexin sequences were obtained for 50% and 67% of clones derived from 
LMP031 and LMP032 libraries, respectively. 
[00332] The two libraries were mixed together at 50/50 ratio and panned in parallel against VEGF, death receptor 
Dr4, ErbB2, and HGFR immobilized on 96-well ELISA plates. Four rounds of panning were carried out 
using 1 000 ng of protein target in the first round, 500 ng in the second round, 250 ng in the third round, and 
100 ng in the fourth round. After the final round of panning, 192 Carb-resistant clones from each selection 
were analyzed for binding to 100 ng immobilized protein target, human IgG, Ovalbumin, and BSA by 

-80- 



40 



WO 2007/103515 



PCT/US2007/005952 



phage ELISA using polyclonal anti-M13 Ab conjugated to horseradish peroxidase for detection. The 
highest percentage of positive clones was obtained for target DR4 (69%), followed by target ErbB2 (53%), 
HGFR (13%), and BoNT target (1%). Positive clones wen* further analyzed by PCR and by DNA 
sequencing. All clones revealed unique sequences and all but one (against DR4) were derived from 
5 LMP032 (C terminal library). Sequences of some of the identified target-selective isolates are shown in 

Fig. 37. 

[00333] For further analysis, an assortment of selected target-specific binders are first subcloned into protein 

expression vector pVSOOl, then produced as soluble microproteins, and finally purified by heat lysis. The 
purified target-specific microproteins are analysed by protein ELISA to confirm the target recognition, by 
10 SDS-PAGE to confirm monomer formation, and by surface plasmon resonance to measure their affinities 

to target. The best clones are used in the next round of library generation to further improve their 
properties. 

Example: Construction of a snake toxin-based library 
[00334] Phage displayed libraries of 10e8 to lOelO of 3 finger toxin (3FT) scaffolds with partially randomized 
15 amino acids of fingertip 1 and descending part of finger 2 or fingertip 3 and ascending part of finger 2 

were created by standard methods. 
[00335] Two 3FT scaffolds were used as a template for 3FT library generation (fingers 1 and 2 configuration). The 

structure of a 3FT scaffold and a multiple sequence alignment of related sequences is shown in Fig. 33. A 

library was designed such that two surface loops of the toxin are randomized as illustrated in Fig. 34. The 
20 library of partially randomized 3FT scaffold was generated by overlapping four library-encoding oligos at 

the annealing regions and using pull-thru PCR followed by restriction cloning (SfLI/BstXI) into phagemid 

vector pMP003. The resulting 3 FT library was designated LMP041. 

Example: Grafting of binding pepti des into microprotein scaffolds - target-specific peotides-assisted 
randomization 

25 [00336] The aim here is to use the peptides that have been identified to be specific for target of interest in order to 

generate 3SSpius target-specific binders. This strategy is illustrated by using VEGF-specific peptide 
transfer into fingertip 1 of 3FT scaffold and by modifying the AA residues of finger 2, which are in close 
proximity from target specific sequence to generate high affinity VEGF binders. Phage displayed libraries 
of 10e8 to lOel 0 of 3 finger toxin (3FT) scaffolds with VEGF specific sequence of fingertip 1 and partially 

30 randomized descending part of finger 2 was created by standard methods as described in example above 

except 2 random finger 1 forward primers were replaced by Fl -VEGF-specific forward primer encoding 
the following sequence: PSGPSCHTTNHWPISAVTCPP. 
[00337J The focused (VEGF-specific) 3FT scaffold library with partially randomized finger 2 was generated by 
overlapping four library-encoding oligos at the annealing regions and using pull-thru PCR followed by 

35 restriction cloning (Sfil/BstXI) into phagemid vector pMP003. The resulting 3FT library was designated 

LMP042. 

Example: Plasma half-life of an MURP 
[00338] The plasma half-life of MURPs can be measured after i.v. or i.p. injection of the MURP into catheterized 
rats essentially as described by [Pepinsky, R. B., et al. (2001) J Pharmacol Exp Ther, 297: 1059-66]. 
40 BIood sam P les can be withdrawn at various time points (5 min, 15 min, 30 min, lh, 3h, 5 h, Id, 2d, 3d) and 

the plasma concentration of the MURP can be measured using ELISA. Pharmacokinetic parameters can be 
calculated using WinNonlin version 2.0 (Scientific Consulting Inc., Apex, NC). To analyze the effect of 
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the URP module one can compare on plasma half-life of a protein containing the URP module with the 
plasma half-life of the same protein lacking the URP module. 
Example: Solubility testing of an MURP 

[003391 Solubility of MURPs can be determined by concentrating purified samples of MURPs in physiological 
buffers like phosphate buffered saline to various concentrations in the range of 0.01 mg/ml to 10 rng/ml. 
Samples can be incubated for up to several weaks. Samples where the concentration exceeds the solubility 
of the MURP show precipitation as indicated by turbidity, which can be measured in an absorbance reader. 
On can remove precipitated material by centrifugation or filtration and measure the concentration of 
remaining protein in the supernatant using a protein assay like the Bradford assay of by measuring the 
absorbance at 280 nm Solubility studies can be accelerated by freezing the samples at -20C and 
subsequent thawing. This process frequently leads to the precipitation of poorly soluble proteins. 
Example: Serum binding activity of MURPs 

[00340] One can coat MURPs of interest into microliter plates and control proteins in other wells of the plate. 

Subsequntly, one can add serum samples of interest to the wells for 1 hour. Subsequently, the wells can be 
washed with a plate washer. Bound serum proteins can be detected by adding antibodies against serum 
proteins that have been conjugated with enzymes like horse radish peroxidase or alkaline phosphatase for 
detection. Another way to detec serum binding to MURPs to add the MURP of interest to serum for about 
1 hour to allow binding. Subsequently, one can immunoprecipitate the MURP using an antibody against an 
epitope in the MURP sequence. The precipitated samples can be analyzed by PAGE and optionally by 
Western to detect any proteins that co-precipitated with the MURP. One can identify the serum proteins 
that show co-precipitation by mass spectrometry. 
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CLAIMS 

WHAT IS CLAIMED IS: 

< 

1 . An unstructured recombinant polymer (URP) comprising at least 40 contiguous amino acids, wherein 
said URP is substantially incapable of non-specific binding to a serum protein, and wherein 

(a) the sum of glycine (G), aspartate (D), alanine (A), serine (S), threonine (T), glutamate (E) and proline 
(P) residues contained in the URP, constitutes more than about 80% of the total amino acids of the URP; and/or 

(b) at least 50% of the amino acids are devoid of secondary structure as determined by Chou- 
Fasman algorithm. 

2. An unstructured recombinant polymer (URP) comprising at least 40 contiguous amino acids, wherein 
said URP has an.ira vitro serum degradation half-life greater than about 24 hours, and wherein 

(a) the sum of glycine (G), aspartate (D), alanine (A), serine (S), threonine (T), glutamate (E) and proline 
(P) residues contained in the URP, constitutes more than about 80% of the total amino acids of the URP; and/or 

(b) at least 50% of the amino acids are devoid of secondary structure as determined by Chou-Fasrnan 
algorithm. 

3. The URP of claim 1 or 2, wherein the URP comprises a non-natural amino acid sequence. 

4. The URP of claim 1 or 2, wherein the URP is selected for incorporation into a heterologous protein, 
and wherein upon incorporation the URP into a heterologous protein, said heterologous protein exhibits a longer 
serum half-life and/or higher solubility as compared to the corresponding protein that is deficient in said URP. 

5. The URP of claim 1 or 2, wherein upon incorporation of the URP into a heterologous protein, said 
heterologous protein exhibits a serum secretion half-life that is at least two times longer as compared to the 
corresponding protein that is deficient in said URP. 

6. The URP of claim 1 or 2, wherein incorporation of the URP into a heterologous protein results in at 
least a 2-fold increase in apparent molecular weight of the protein as approximated by size exclusion 
chromatography. 

7. The URP of claim 1 or 2, wherein the URP has a Tepitope score less than -4. 

8. The URP of claim 1 or 2, wherein the amino acids are predominantly hydrophilic residues. 

9. The URP of claim 1 or 2, wherein at least 50% of the amino acids of the URP are devoid of secondary 
structure as determined by Chou-Fasman algorithm. 

10. The URP of claim 1 or 2, wherein glycine residues contained in the URP constitute at least about 50% 
of the total amino acids of the URP. 

1 1 . The URP of claim 1 or 2, wherein any one type of the amino acids alone selected from the group 
consisting of glycine (G), aspartate (D), alanine (A), serine (S), threonine (T), glutamate (E) and proline (P) 
constitutes more than about 20% of the total amino acids of the URP. 

12. The URP of claim 1 or 2, wherein any one type of the amino acids alone selected from the group 
consisting of glycine (G), aspartate (D), alanine (A), serine (S), threonine (T), glutamate (E) and proline (P) 
constitutes more than about 40% of the total amino acids of the URP. 

13. The URP of claim 1 or 2, wherein the URP comprises more than about 100 contiguous amino acids. 

14. The URP of claim 1 or 2, wherein the URP comprises more than about 200 contiguous amino acids. 

15. The URP of claim 1 or 2 comprising repeat sequences. 
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16. The URP of claim 1 or 2, wherein one type of the amino acids alone selected from the group consisting 
of glycine (G), aspartate (D), alanine (A), serine (S), threonine (T), glutamate (E), and proline (P), constitutes more 
than 50% of the total amino acids of the URP. 

17. A protein comprising one or more URPs of claim 1 or 2, wherein said one or more URPs are 
5 heterologous with respect to the protein. 

1 8. The protein of claim 17 comprising an effector module. 

19. The protein of claim 17 comprising a binding module. 

20. The protein of claim 17 comprising an effector module and a binding module. 

21. The protein of claim 17 comprising comprising a plurality of binding modules, wherein the individual 
10 binding modules exhibit binding specificities to the same or different targets. 

22. The protein of claim 17, wherein the total length of URPs in aggregation exceeds about 150 amino 

acids. 

23. The protein of claim 1 7 comprising one or more binding modules, wherein the binding module 
comprises a disulfide-containing scaffold formed by intra-scafTold pairing of cysteines. 

15 24. The protein of claim 17 comprising an effector module which is cytotoxic. 

25. The protein of claim 17 comprising a binding module specific for a target molecule, wherein the target 
is selected from the group consisting of cell surface protein, secreted protein, cytosolic protein, and nuclear protein. 

26. The protein of claim 17 comprising a binding module specific for a target molecule, wherein the target 
is an ion channel. 

20 27. The protein of claim 17 exhibiting an extended serum secretion half-life by at least 2 folds as 

compared to a corresponding protein that is deficient in said URP. 

28. A non-naturally occurring protein comprising at least 3 repeating units of amino acid sequences, each 

of the repeating unit comprising at least 6 amino acids, wherein the majority of segments comprising about 6 to 

about 15 contiguous amino acids of the at least 3 repeating units are present in one or more native human proteins. 
25 29. The protein of claim 28, wherein the majority of segments comprising about 9 to about 1 5 contiguous 

amino acids within the repeating units are present in one or more native human proteins. 

30. The protein of claim 28, wherein each segment comprising about 6 to about 15 contiguous amino acids 
within the protein, is present in at least one native human protein. 

31. The protein of claim 28, wherein each segment comprising about 9 to about 15 contiguous amino acids 
30 within the protein, is present in at least one native human protein. 

32. The protein of claim 28, wherein the at least 3 repeating units share sequence identity of greater than 
about 80%. 

33. The protein of claim 28, wherein each of the repeating units comprises about 6 to about 15 contiguous 
amino acids. 

35 34. The protein of claim 28, wherein each of the repeating units comprises about 9 contiguous amino 

acids. 

35. The protein of claim 28 comprising one or more modules selected from the group consisting of binding 
modules, effector modules, multimerization modules, C-terminal modules, and N-tenriinal modules. 

36. The protein of claim 28, wherein an individual repeating unit comprises an unstructured recombinant 
40 polymer (URP). 

37. The protein of claim 36, wherein the URP comprises at least 40 contiguous amino acids, wherein said 
URP is substantially incapable of non-specific binding to a serum protein, and wherein 
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(a) the sum of glycine (G), aspartate (D), alanine (A), serine (S), threonine (T), glutamate (E) and proline 
(P) residues contained in the URP, constitutes more than about 80% of the total amino acids of the URP; and/or 

(b) at least 50% of the amino acids are devoid of secondary structure as determined by Chou- 
Fasman algorithm. 

5 38. The protein of claim 36, wherein the URP comprises at least 40 contiguous amino acids, wherein said 

URP has an in vitro serum degradation half-life greater than about 24 hours, and wherein 

(a) the sum of glycine (G), aspartate (D), alanine (A), serine (S), threonine (T), glutamate (E) and proline 
(P) residues contained in the URP, constitutes more than about 80% of the total amino acids of the URP; and/or 

(b) at least 50% of the amino acids are devoid of secondary structure as determined by Chou- 
10 Fasman algorithm. 

39. A recombinant polynucleotide comprising a coding sequence that encodes the URP of claim 1 or 2. 

40. A recombinant polynucleotide comprising a coding sequence that encodes the protein of claim 17. 

41 . A recombinant polynucleotide comprising a coding sequence that encodes the protein of claim 28. 

42. A host cell comprising the recombinant polynucleotides of claim 40. 
15 43. A host cell comprising the recombinant polynucleotides of claim 41 . 

44. A vector comprising the recombinant polynucleotide of claim 40. 

45. A vector comprising the recombinant polynucleotide of claim 41. 

46. A selectable library of expression vectors comprising more than one vector of claim 44. 

47. A selectable library of expression vectors comprising more than one vector of claim 45. 
20 48. A genetic package displaying the library of claim 46. 

49. A genetic package displaying the library of claim 47. 

50. A method of producing a protein comprising an unstructured recombinant polymer (URP), comprising: 
(i) providing a host cell comprising a recombinant polynucleotide encoding the protein, said protein 

comprising one or more URP, said URP comprising at least 40 contiguous amino acids, wherein said URP is 
25 substantially incapable of non-specific binding to a serum protein, and wherein 

(a) the sum of glycine (G), aspartate (D), alanine (A), serine (S), threonine (T), glutamate (E) and proline 
(P) residues contained in the URP, constitutes more than about 80% of the total amino acids of the URP; and/or 

(b) at least 50% of the amino acids are devoid of secondary structure as determined by Cbou-Fasman 
algorithm; 

30 (ii) culturing said host cell in a suitable culture medium under conditions to effect expression of said 

protein from said polynucleotide. 

51 . The method of claim 50 wherein the URP has an in vitro serum degradation half-life greater than about 
24 hours. 

52. The method of claim 50 wherein the host cell is a eukaryotic cell. 
35 53. The method of claim 50 wherein the host cell is CHO cell. 

54. The method of claim 50 wherein the host cell is a prokaryotic cell. 

55. A method of increasing serum secretion half-life of a protein, comprising: 

fusing said protein with one or more unstructured recombinant polymers (URPs), wherein the 
URP comprises at least about 40 contiguous amino acids, and wherein 
40 (a) the sum of glycine (G), aspartate (D), alanine (A), serine (S), threonine (T), glutamate (E) and 

proline (P) residues contained in the URP, constitutes more than about 80% of the total amino acids of the URP; 
and/or 
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(b) at least 50% of the amino acids are devoid of secondary structure as determined by Chou- 
Fasman algorithm; and wherein said URP is substantially incapable of non-specific binding to a serum protein. 

56. The method of claim 55 wherein the serum secretion half-life of the protein is extended by at least 2 

folds. 

5 57. A method of detecting the presence or absence of a specific interaction between a target and an 

exogenous protein that is displayed on a genetic package, wherein said protein comprises one or more unstructured 
recombinant polymer (URP), the method comprising: 

(a) providing a genetic package displaying a protein that comprises one or more unstructured 
recombinant polymers (URPs); 

10 (b) contacting the genetic package with the target under conditions suitable to produce a stable 

protein-target complex; and 

(c) detecting the formation of the stable protein-target complex on the genetic package, thereby 

detecting the presence of a specific interaction. 

58. The method of claim 57 further comprises obtaining a nucleotide sequence from the genetic package 
15 that encodes the exogenous protein. 

59. The method of claim 57, wherein the presence or absence of a specific interaction is between the URP 
and a target comprising a serum protein. 

60. The method of claim 57, wherein the presence or absence of a specific interaction is between the URP 
and a target comprising a serum protease. 

20 61. The method of claim 57, wherein the target or the protein is selected from the group consisting of cell 

surface protein, secreted protein, cytosolic protein, and nuclear protein. 

62. The method of claim 57, wherein the genetic package is phage. 

63. The method of claim 57, wherein the URP comprises at least about 40 contiguous amino acids, and 
wherein the URP is substantially incapable of non-specific binding to a serum protein, and further wherein 

25 (a) the sum of glycine (G), aspartate (D), alariine (A), serine (S), threonine (T), glutamate (E) and proline 

(P) residues contained in the URP, constitutes more than about 80% of the total amino acids of the; and/or 

(b) at least 50% of the amino acids are devoid of secondary structure as determined by Chou-Fasman 
algorithm. 

64. The method of claim 57, wherein the URP comprises at least about 40 contiguous amino acids, and 
30 wherein the URP has an in vitro serum degradation half-life greater than about 24 hours. 

65. The method of claim 50, 55 or 57, wherein the URP comprises a non-natural amino acid sequence. 

66. The method of claim 50, 55 or 57, wherein the URP has a Tepitope score equal to or less than -4. 

67. The method of claim 50, 55 or 57, wherein the URP is devoid of secondary structure as determined by 
Chou-Fasman algorithm. 

35 68. The method of claim 50, 55 or 57, wherein glycine residues contained in the URP constitute at least 

about 50% of the total amino acids of the URP. 

69. The method of claim 50, 55 or 57, wherein the sum of glycine (G), aspartate (D), alanine (A), serine 

(S), threonine (T), glutamate (E), and proline (P) residues contained in the URP, constitutes more than about 60% of 

the total amino acids of the URP. 
40 70. The URP of claim 50 or 51, wherein one type of me amino acids selected from the group consisting of 

glycine (G), aspartate (D) ? alanine (A), serine (S), threonine (T), glutamate (E), and proline (P) constitutes more than 

50% of the total amino acids of the URP. 

-86- 



WO 2007/103515 



PCT/US2007/005952 



71. The method of claim 50, 55 or 57, wherein the URP comprises more than 100 contiguous amino acids. 

72. The method of claim 50, 55 or 57, wherein the URP comprises repeat sequences. 

73. The method of claim 50, 55 or 57, wherein the protein is a therapeutic protein. 

74. The method of claim 50, 55 or 57, wherein the protein comprises one or more modules selected from 
the group consisting of binding modules, effector modules, multimerization modules, C-terminal modules, 
and N-terminal modules. 
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